ArticlePDF Available

COACH: Consistency Aware Check-pointing for Nonvolatile Processor in Energy Harvesting Systems

December 2019
IEEE Transactions on Emerging Topics in Computing 9(4):1-1

December 2019
9(4):1-1

DOI:10.1109/TETC.2019.2961007

Authors:

Amir Mahdi Hosseini Monazzah

Iran University of Science and Technology

Mostafa Bazzaz

Sharif University of Technology

Bardia Safaei

Sharif University of Technology

Show all 5 authorsHide

Recently, energy harvesting systems that utilize hybrid NVM-SRAM memory in their designs are introduced as a promising alternative for battery-operated systems. Since the ambient input power of an energy harvesting system fluctuates as the environmental conditions change, the system may stop the execution of programs until it receives enough energy to continue the execution. Resuming the execution of a program after the suspension may lead to data inconsistency in an energy harvesting system and threatens the correct functionality of the programs. In this paper, we propose COACH, an energy-efficient consistency-aware memory scheme which guarantees the correct functionality and consistency of the program in an energy harvesting system. The experimental results show that COACH improves forward-progress of the programs in the system by up to 60% compared with the state of the art consistency-aware approaches without imposing considerable energy overhead to the system.

An overview of an energy harvesting system (a) and a COACHenabled processor (b).

…

The state transition diagram for each entry of the memory.

…

Overhead of FRAM area on total price of a micro controller.

…

Setup configurations used in simulations. Simulation configurations * . Processor Energy Per Instruction (No Memory Access) 0.16 nJ

…

Figures - uploaded by Bardia Safaei

Content may be subject to copyright.

Content uploaded by Bardia Safaei

Content may be subject to copyright.

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 1

COACH: Consistency Aware Check-pointing for

Nonvolatile Processor in Energy Harvesting

Systems

Ali Hoseinghorban∗, Amir Mahdi Hosseini Monazzah†‡, Mostafa Bazzaz∗,

Bardia Safaei∗, and Alireza Ejlali∗

Abstract—Recently, energy harvesting systems that utilize hybrid NVM-SRAM memory in their designs are introduced as a promising

alternative for battery-operated systems. Since the ambient input power of an energy harvesting system ﬂuctuates as the

environmental conditions change, the system may stop the execution of programs until it receives enough energy to continue the

execution. Resuming the execution of a program after the suspension may lead to data inconsistency in an energy harvesting system

and threatens the correct functionality of the programs. In this paper, we propose COACH, an energy-efﬁcient consistency-aware

memory scheme which guarantees the correct functionality and consistency of the program in an energy harvesting system. The

experimental results show that COACH improves forward-progress of the programs in the system by up to 60% compared with the

state of the art consistency-aware approaches without imposing considerable energy overhead to the system.

Index Terms—Data consistency, Energy-efﬁciency, Nonvolatile memory, Energy harvesting system.

1 INTRODUCTION

ENERGY harvesting systems are considered as a

replacement for battery-operated systems in situations

that using battery-operated systems might be restricted

(e.g., some wearable application) or the charging process

of batteries is not feasible or is difﬁcult. Energy harvesting

systems are one of the main contributors to the marketing

of IoT devices which have been anticipated to reach more

than 100 billion objects in the near future [1], [2], [3].

While these systems are considered as a feasible solution

for the situations that battery-operated systems can not be

applied, the unpredictability of the ambient power leads

to frequent failures in these systems. Therefore, in these

systems, we need to back up the state of the processor and

memories into nonvolatile storage to enable the processor

and memories to return to their nearest possible state after

restarting from a failure [4].

In the capacitor-less energy harvesting systems, the

output of harvester directly powers up the system. The

characteristics of the energy sources (like solar, radio

frequency radiation, and piezoelectricity) have a signiﬁcant

impact on these systems. Furthermore, capacitor-less energy

harvesting systems are only operational when the input

power is higher than the energy consumption of the system;

so, these systems are inactive most of the time [5]. For this

purpose, the state of the art studies [6], [7], [8], [9], exploit

a bulk capacitor to accumulate energy and power up the

system; so, the system is operational even when the input

∗Department of Computer Engineering, Sharif University of Technology,

Tehran, Iran.

†School of Computer Engineering, Iran University of Science and Technology

(IUST), Tehran, Iran.

‡School of Computer Science, Institute for Research in Fundamental Sciences

(IPM), Tehran, Iran.

EMail: {hoseinghorban, bazzaz, bsafaei}@ce.sharif.edu, monazzah@iust.ac.ir,

ejlali@sharif.edu.

power is weak or unstable. In energy harvesting systems,

the processor is nonfunctional when the energy in the bulk

capacitor (the input capacitor in the system which stores the

absorbed energy) drops below a speciﬁc threshold. On the

other hand, in such systems, the rate of harvesting energy

and charging the bulk capacitor is uncontrollable and might

be low on many occasions. Accordingly, inefﬁcient usage of

harvested energy in the programs running on these systems

increases the rate of energy consumption and consequently

discharges the bulk capacitor faster. As a result, the overall

duration of nonfunctional state(s) in the processor will be

increased, and the rate of the program forward-progress

(number of validly executed instructions) will be reduced.

Emerging nonvolatile memories (NVMs) like STT-

MRAM, PCM, and FRAM have promising features such

as high density, low leakage power consumption, byte-

writable granularity and non-volatility (the ability to keep

data in case of power failure). These features motivate

system designers to exploit NVMs in energy harvesting

systems with the hope that they can save the states of

the processor when the input power to the system fails.

Despite the interesting features of NVMs, accessing to

them is slower and consumes more energy compared with

volatile memories (such as SRAM). Therefore, the hybrid

approaches that beneﬁt from positive attributes of both

volatile and nonvolatile memories in their architecture are

more promising. In this regard, previous studies showed

that hybrid designs have better energy consumption

compared to fully NVM designs [6], [10], [11]. Accordingly,

there are some efforts to develop such architectures by

chip manufacturers. For example, MSP430FR5964 by Texas

Instruments®[12] exploits a nonvolatile processor (NVP)

with 256 KB FRAM and 8 KB SRAM memory.

However, a processor with hybrid NVM-SRAM memory

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 2

in an energy harvesting system will be faced with a

challenge called data inconsistency. In fact, in case of failure

in such systems, while NVM keeps the modiﬁed data, the

contents of SRAM will be lost. After the failure, the system

rolls back to the last successful check-point and resumes

the program from there. However, merely resuming the

program from the last successful check-point may cause

data inconsistency problem since some data has been

modiﬁed in the NVM after the check-point (we will discuss

data inconsistency problem with more detail in section 2.1).

To deal with data inconsistency challenge, there are

many studies like [9], [13], [14] which put a check-point

between each load-store pair targeting the same address to

avoid data inconsistency. These approaches are slow and

energy inefﬁcient because they impose too many check-

points to the program [15], [16]. DINO [17], ALPACA [15],

and CHAIN [7] proposed programming models in which

the programmer should partition the program to a set

of atomic tasks. The system either completes an atomic

task and commits the changes to the memory without

power failure or discards all the changes. To this end,

in the entry of each atomic task, the system copies the

consistency sensitive data of the task to a buffer, and if

the task fails to execute successfully, the system discards

the buffer (double buffering technique). These studies

assume that the programmers have precise knowledge

about the energy consumption and the bulk capacitor of

the platform to decide the size of atomic tasks which is

a non-trivial assumption. Furthermore, these approaches

need to commit (discard) the changes in the buffer in case of

backup (recovery), which increases the backup and recovery

overheads. To alleviate the interactions of the programmers,

proposed approaches in [2], [16] dynamically track the

modiﬁed data during execution and instead of whole

memory, they only backup (discard) the modiﬁed data since

the last check-point. However, tracking the modiﬁed data

in these approaches increases the energy consumption of

normal read and write operations of the system signiﬁcantly

(see section 2.2).

In this paper, we propose COACH, an energy-efﬁcient

and consistency-aware memory management scheme for

energy harvesting systems. Similar to double buffering

techniques, COACH consists of two memories to keep the

value of data in the last successful check-point in addition

to the last modiﬁed value of data after that check-point.

Accordingly, in case of power failure, COACH could resolve

the inconsistency problem. COACH introduces a new

mechanism for backup and rollback operations. The main

superiority of COACH over double buffering techniques

like [2], [7], [15], [16], [17], is that COACH’s backup and

rollback operations are very fast and energy efﬁcient which

improves forward-progress compared to state of the art

approaches. Furthermore, COACH is a pure hardware

approach which guarantees the consistency without any

programmer effort [7], [15], [17], compiler modiﬁcation [9],

[13], or operating system support [16]. COACH works with

any check-pointing mechanism and does not force any

additional check-points to the system.

The main contributions of this paper are as follow:

•We introduce a simple and effective memory

0x100168 <Loop>:

0x100168: ldr r3, [ip, #4]!

0x10016c: ldr r2, [pc, #616]

0x100170: ror r3, r3, #31

0x100174: cmp ip , r2

0x100178: str r3, [ip]

0x10017c: bne 0x100168

(a) Assembly code of a loop

in SHA application

Instruction Energy

(nJ)

NVM Access

Energy (nJ)

Processor Backup

Energy (nJ)

0.16 0.22 19.50

0.18

2.13

1.62

0.5

1.5

2.5

Consistency

Unaware InCoAv InCoRe

Energy (μJ)

Energy Consumption of 100 Iterations of the Loop

(b) Energy consumption of the loop for

three different approaches

Fig. 1. Energy overhead of the state of the art consistency-aware

approaches (InCoAv [9] and InCoRe [2]) compared to Consistency-

Unaware approach for a small loop in SHA application.

management scheme to efﬁciently handle the data

inconsistency problem of the NVM memories in

energy harvesting systems which does not impose

any extra check-point to the system.

•We introduce new backup and rollback operations

compatible with the proposed hardware, which are

fast and energy efﬁcient because COACH eliminates

the need for discarding (committing) modiﬁed data

in the rollback (backup) operation.

•We explore the area and energy overhead of

COACH and other states of the art consistency-aware

approaches.

•We investigate the effects of the bulk capacitor

size, operational frequency of the processor, and

the ambient power ﬂuctuations on the efﬁciency of

COACH.

Light is considered as the source of the ambient power

in this study since it is widely used in energy harvesting

systems, and it is available in in-door and out-door

locations. COACH is evaluated considering weak, strong,

stable, and unstable solar traces [8], [18]. The results show

that in comparison with the state of the art consistency-

aware energy harvesting systems, proposed by Xie et al. [9]

and Senni et al. [2], COACH improves the forward-progress

by 60% and 48%, respectively.

The rest of this paper is organized as follows: A

motivational example is presented in Section 2. Section

3 includes the details of COACH. We will evaluate the

efﬁciency of COACH in Section 4. Related works are

presented in Section 5, and ﬁnally, we will conclude the

paper in Section 6.

2 OB SE RVATION S AN D MOTIVATI ON S

With the advent of energy harvesting NVPs, inconsistency

problem becomes a signiﬁcant challenge. In this section,

ﬁrst, we explain the inconsistency problem of energy

harvesting NVPs (2.1) as our main consideration and then

we will discuss challenges in the state of the art approaches

(2.2) as our primary motivation in this study.

2.1 Inconsistency Problem

Here we will show how the inconsistency problem threatens

the system focusing on an example. Assume that we have

an energy harvesting NVP which is responsible for running

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 3

0123

0.0

4.0

8.0

12.0

# of Insts (×106)

Time (s)

Consistency-Unaware InCoAv InCoRe

2.4

2.6

2.8

3.0

Cap Voltage (v)

Consistency-Unaware InCoAv InCoRe

100

200

300

400

Input (W/m2)

Total Solar Irradiance

0123

Time (s)

(a) Mobile device, At 01:00 PM

0.0

0.5

1.0

1.5

# of Insts (×106)

Consistency-Unaware InCoAv InCoRe

2.4

2.6

2.8

Cap Voltage (v)

Time (s)

Consistency-Unaware InCoAv InCoRe

Input (W/m2)

Solar Irradiance

Total

0123

Time (s)

(b) Stationary device, At 08:00 PM

Fig. 2. Forward-progress and input capacitor voltage trace of executing SHA application at two different hours of a day (a) 01:00 PM and (b)

08:00 PM for three different approaches (Consistency Unaware, InCoAv [9] and InCoRe [2]). In each column, the top, middle, and bottom charts

show the forward progress, bulk capacitor voltage, and the input solar irradiance, respectively.

SHA application. An example of a loop part in SHA code is

shown in Fig. 1a. In each iteration of this loop, an element

is read from the array (0x100168), rotated (0x100170) and

stored back to the array (0x100178). In case of power

failure (voltage of the input capacitor drops below a speciﬁc

threshold), the system rolls back to the last valid check-point

(which in this example is the one placed before the loop),

and re-executes the program from there.

During the execution of the loop, some elements of

the array are in the SRAM-based cache, some of them

are written back to the nonvolatile memory (evicted from

cache), and the others have not been modiﬁed yet. In case

of power failure, cache loses the data while nonvolatile

memory keeps them. Therefore re-executing the loop from

the last check-point causes inconsistency problem because

some elements in the array have been modiﬁed (rotated) in

the nonvolatile memory.

2.2 Challenges in the State of the Art Approaches

To tackle the inconsistency problem in the previous

subsection, NVPs need a proper rollback mechanism to

restore the modiﬁed entries of memory (after system

failure) to their values in the last valid check-point.

Literature studies that have considered failed execution

in their proposed methods, either avoid inconsistency

in nonvolatile memory by imposing a large number of

check-pointing operations to the systems (inconsistency

avoidance, abbreviated InCoAv), like approaches proposed

by Van Der Woude et al. [13], Xie et al. [9], [14], or keep track

of modiﬁed data and discard (commit) them in case of failed

(successful) execution (inconsistency recovery, abbreviated

InCoRe), like the approaches proposed by Ma et al. [19],

Senni et al. [2] and Maeng et al. [16]. To show the energy

inefﬁciency of both approaches, we conducted a set of

simulations on the provided SHA code shown in Fig. 1.

Considering [9] as a representative of InCoAv check-

pointing approaches, in this study, a check-point is placed

between each pair of load and store instructions that target

the same address. This strategy imposes a large number of

check-points to the system. For example, in the small loop

code presented in Fig. 1a, this approach forces a check-point

in each iteration of the loop between load (0x100168) and

store (0x100178) instructions.

In the InCoRe check-pointing approaches, Senni et al.

[2] proposed a new memory scheme which exploits a

backup memory, main memory, and an address buffer (all

nonvolatile). At the start of the program, the contents of

backup and main memory are the same. In the normal

execution, data is read and written to the main memory

and the address buffer keeps the addresses of the modiﬁed

data. In case of a successful backup, the modiﬁed data will

be copied from the main memory to the backup memory

or vice versa (in case of rollback). Among the studies in

the second group approaches, the last mentioned approach

is the most promising one for ultra-low-power energy

harvesting systems. Therefore, we assume the study by

Senni et al. [2] as a representative of the InCoRe check-

pointing approaches.

Considering the approach in [2], additional check-points

might be added to the program if the address buffer is

full. Backup and rollback overhead is relatively high since

this approach needs to copy all the modiﬁed data to the

backup memory. Furthermore, in each write operation, the

controller searches the address buffer to check whether

the corresponding address exists in it or not (if the

corresponding address is already in the address buffer, the

controller will not add that address to the buffer). Thus, the

architecture presented by Senni et al. [2] will signiﬁcantly

increase the consumed energy by write operations. In the

following, for the sake of simplicity, we will call the Xie et

al. [9] approach, as InCoAv check-pointing and the proposed

method by Senni et al. [2], as InCoRe check-pointing.

Fig. 1b shows the energy consumption corresponding

to the execution of the mentioned loop in Fig. 1a for 100

iterations. The left bar shows the energy consumption of a

Consistency-Unaware approach. The Consistency-Unaware

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 4

Mem[0]

(FRAM)

Mem[1]

(FRAM)

Control

Unit

Time

Loc

Enable

Addr

Data

Cache

(SRAM)

NVP Req

Main

Memory

PV Module

Decouple

Capacitor

Hysteresis

Controller

Tag Mem

(FRAM)

Processing

System

Processing

System

Bulk

Capacitor

(a)

Mem[0]

(NVM)

Mem[1]

(NVM)

Control

Unit

Time

Loc

Enable

Addr

Data

Cache

(SRAM)

NVP

Req

Main

Memory

PV Module

Decouple

Capacitor

Hysteresis

Controller

Tag Mem

(NVM)

Processing

System

Processing

System

Bulk

Capacitor

(b)

Fig. 3. An overview of an energy harvesting system (a) and a COACH-

enabled processor (b).

approach performs normal read and write operations

without considering the consistency of the application.

Thus, in case of power failure, Consistency-Unaware can

not restore the modiﬁed data in the nonvolatile memory.

While Consistency-Unaware approach suffers from wrong

functionality, it is the ideal case in terms of energy

consumption. The middle and right bars show the energy

consumption of InCoAv and InCoRe (with a 64 words

address buffer), respectively. As the state of the art

approaches, both InCoAv and InCoRe approaches guarantee

the consistency and correct functionality of the application

at the cost of imposing high energy overhead to the system

(depicted in Fig. 1b).

To compare the three approaches from the energy

harvesting perspective, we conduct several experiments

with two different solar irradiance intensities that represent

the solar radiation of the sun at 01:00 PM (a mobile device

with solar irradiance between 0 W/m2to 380 W/m2)

and 08:00 PM (a stationary device with 40 W/m2solar

irradiance). For each scenario, the forward-progress, the

voltage of the bulk capacitor and the input irradiance is

depicted in Fig. 2. It should be noted that the processor

input voltage in our experiments must be in the range of

2.5v to 3v. If the voltage of the bulk capacitor drops below

2.5v, hysteresis controller disconnects the processor from the

capacitor, and when the capacitor voltage reaches to 2.8v, it

connects the capacitor to the processor again (see Fig. 3).

Fig. 2a illustrates that InCoAv has very slow forward-

progress since most of the time the energy is spent

on unnecessary check-points. On the other hand, the

inefﬁciency of write operations (searching relatively large

address buffer during each write access) and high backup

cost in InCoRe reduce forward-progress signiﬁcantly in case

of low input irradiance (see Fig. 2b). The capacitor voltage

trace in Fig. 2a, and Fig. 2b shows that InCoAv discharges

the bulk capacitor faster than InCoRe. It is noteworthy

to mention that, although Consistency-Unaware approach

uses energy of the capacitor more efﬁciently (the capacitor

is discharged slower and the processor executes more

instructions), it cannot guarantee the correct functionality

of the application, hence the total executed instructions are

useless in many applications.

3 COACH APPROAC H

Section 2 clearly shows the importance of developing a

novel energy efﬁcient scheme, i.e., COACH which can

guarantee the consistency and the correct functionality

of the tasks during different energy absorbing states of

NVPs, while providing a considerable forward-progress

for the applications. In this section, we ﬁrst describe the

proposed memory scheme and its corresponding algorithms

in subsection 3.1. Then, we will explore the functionality of

the COACH with a case study example in subsection 3.2.

Finally, we will discuss the possible overheads of COACH

in subsection 3.3.

3.1 COACH’s Proposed Hardware Scheme

Fig. 3 depicts an overview of a typical energy harvesting

system and the proposed COACH architecture in this study.

COACH beneﬁts from a nonvolatile processor connected

to an SRAM-based cache. From the memory hierarchy

perspective, besides the mentioned cache, the main memory

in COACH utilizes two NVM memories to keep the current

value of the application data as well as the last successful

check-point. Furthermore, to improve backup and recovery

overheads and resolve the inconsistency problem, COACH

is equipped with a control unit which contains two registers

to keep track of the total backups and rollbacks in the

system. Furthermore, COACH exploits an NVM memory as

Tag Mem, which has an entry for each data to keep the state

of the data. The modiﬁed parts in Fig. 3b are highlighted

with hatching. The required peripheral modules for energy

absorption are depicted in Fig. 3a.

While COACH is capable of working with any check-

pointing approaches, for the sake of generality and

simplicity we consider check-pointing approach proposed

by Liu et al. [20] which exploits a watchdog timer that raises

a backup signal periodically. During a normal execution of

the program, the evicted dirty cache blocks are written back

to the NVM memory. According to the discussion in the

previous sections, in case of system failure, the system rolls

back to the nearest successful check-point. However, the

system needs a proper recovery mechanism to revert the

modiﬁed data in the NVM to their corresponding values

in the last successful check-point to avoid inconsistency

problem [9].

Accordingly, when a data entry is modiﬁed in the NVM

memory (we refer to it as TempValue), the valid value of

the data in the last successful check-point (we refer to it

as LastChkptValue) should be kept, hence, in case of failure,

the system could discard changes in the NVM memory.

To this end, COACH considers a duplicated memory

(Mem[0] and Mem[1] in Fig. 3b) to keep both TempValue

and LastChkptValue versions for each data. To mitigate the

overhead of committing (or discarding) TempValue of the

modiﬁed data in case of successful backup (or rollback),

COACH exploits an NVM tag memory (Tag Mem). Each

row of this memory includes two ﬁelds, called Loc and Time.

For each data block written to memory during the normal

execution, the information about the location (Mem[0] or

Mem[1]) and the time (count of backups and rollbacks) of

this update is saved in Loc and Time ﬁelds, respectively.

To control the backup and rollback operations, COACH

utilizes a control unit depicted in Fig. 3b. This control

unit plays an important role in redirecting the memory

requests to the proper memory modules (Mem[0] or Mem[1])

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 5

Algorithm 1 Control Unit Procedure

Input: Request (Req), Request Address (Addr), Request Data

(Data), TBR Cntr Register (TBR Cntr), SR Cntr Register

(SR Cntr), Tag Mem (Tag), Memory (Mem).

1: procedure MEMORY ACCESS

2: LocTag =Tag[Addr]Loc

3: TimeTag =Tag[Addr]Time

4: if Req is a Read request then

5: if TimeTag <(TBR Cntr +SR Cntr +1)and

TimeTag >TBR Cntr then

6: Data =Mem[not LocTag][Addr]

7: else

8: Data =Mem[LocTag][Addr]

9: end if

10: else if Req is a Write request then // Write request

11: if TimeTag <TBR Cntr then

12: Mem[not LocTag][Addr] = Data

13: Tag[Addr] = {not LocTag,TBR Cntr +SR Cntr +1}

14: else

15: Mem[LocTag][Addr] = Data

16: Tag[Addr] = {LocTag,TBR Cntr +SR Cntr +1}

17: end if

18: else if Req is a Backup request then

19: TBR Cntr =TBR Cntr +SR Cntr +1

20: SR Cntr =0

21: else if Req is a Rollback request then

22: SR Cntr =SR Cntr +1

23: else // Flush request

24: Tag[∗]Time =0

25: TBR Cntr =0

26: SR Cntr =0

27: end if

28: end procedure

based on the validity of data in the memories. To this end,

this module is equipped with two nonvolatile registers:

1) Total Backup and Rollback Counter (we refer to it as

TBR Cntr), and 2) Successive Rollback Counter (we refer to

it as SR Cntr). While TBR Cntr keeps track of the total

backups and rollbacks since the start of the program up to

the last successful check-point, SR Cntr counts the number

of successive rollbacks since the last successful backup. The

TBR Cntr and the SR Cntr are nonvolatile registers because

their data must be sustained in case of power failure. In

the state of the art nonvolatile processors [21], [22], [23],

nonvolatile registers are made with NVFF (NVFF is a

CMOS ﬂip-ﬂop attached to a nonvolatile cell), to improve

energy and performance of the processor. Therefore, during

execution, the data are accessed through CMOS ﬂip-ﬂops,

and in case of backup (restore), the CMOS ﬂip-ﬂops are

written to (read from) nonvolatile cells.

To handle the inconsistency problem COACH should

consider new mechanisms for read, write, backup, and

rollback operations. In the following, we will introduce the

above modiﬁcations made to the system equipped with

COACH. To handle read and write operations in COACH,

we have deﬁned three states for each data residing in

Mem[0] or Mem[1].

•safe: Data has not been modiﬁed since the last

successful check-point.

•modiﬁed: Data has been modiﬁed during the current

execution.

•failed: Data has been modiﬁed in failed execution.

Safe State

Modified

State

Read,

Flush

Failed

State

Read

Write

Backup Rollback

Write

Read,

Write

Fig. 4. The state transition diagram for each entry of the memory.

Upon each memory access, the control unit speciﬁes the

state of accessed data by using the TBR Cntr register, the

SR Cntr register in the control unit, and Time ﬁeld of the

data in the Tag Mem. If Time of a data entry is less than

TBR Cntr, it means that the last modiﬁcation of data was

before the last successful check-point; thus, data is in a safe

state. On the other hand, if the Time of a data entry is greater

than TBR Cntr +SR Cntr, it is in a modiﬁed state. Finally,

if Time of a data entry is less than TBR Cntr +SR Cntr

and greater than TBR Cntr, the data is in a failed state.

The control unit speciﬁes LastChkptValue of each data is in

Mem[0] or Mem[1] by using the Loc ﬁeld in the Tag Mem.

For a data entry in a safe state, the Loc ﬁeld points to

the memory module which contains LastChkptValue, and

in modiﬁed and failed states, it points to the TempValue.

Therefore with this information, COACH can backup and

recover NVM memory fast and with low energy overhead.

Algorithm 1 represents the procedure that has been used

in the control unit of COACH for the read, write, backup,

rollback, and ﬂush operations.

3.1.1 COACH’s Read/Write Operation

Considering read operation, based on the requested address,

COACH checks the location and the time of the last update

in the requested address to retrieve the state of the data (line

2-3 in Algorithm 1). Accordingly, if the requested data is in

afailed state, the LastChkptValue of the data will be read from

Mem[not LocTag](line 5-6 in Algorithm 1); otherwise the last

modiﬁed value of the data is read from Mem[LocTag ](line 7-8

in Algorithm 1).

For each write request, if the requested data is in safe

state, the new data is written to Mem[not LocTag](line 11-12

in Algorithm 1) which does not contain LastChkptValue of the

data, otherwise it will be written to Mem[LocTag](line 14-15

in Algorithm 1) which contains the modiﬁed version of data

in the last system failure (FailedValue) or current execution

(TempValue). Thus, LastChkptValue of the data will not be

overwritten, and in case of system failure, COACH could

easily rollback to the last successful check-point without

causing any inconsistency problem. After each write, the

control unit updates the Tag Mem to keep track of the last

modiﬁcations (line 13 and 16 in Algorithm 1).

3.1.2 COACH’s Backup/Rollback Operation

Backup and rollback operations in COACH are simple and

fast, without consuming a considerable amount of energy.

During the backup operation, COACH simply increments

the TBR Cntr by the value of SR Cntr plus one and resets

the SR Cntr (line 18-20 in Algorithm 1). As a result of this

operation, the attributes of all data in the modiﬁed state is

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 6

changed to a safe state (Time of the modiﬁed data becomes

less than TBR Cntr).

When the processor wakes up after a failure, the rollback

procedure in Algorithm 1 will be executed. To this end,

COACH only increments the SR Cntr register by one (line

21-22 in Algorithm 1). Accordingly, the attributes of all data

in the modiﬁed state are changed to failed state (Time of

modiﬁed data becomes less than TBR Cntr plus SR Cntr and

greater than TBR Cntr).

3.1.3 COACH’s Flush Operation

We have considered a ﬂush operation in COACH to prevent

overﬂow of TBR Cntr or SR Cntr registers. The procedure

used by COACH during the ﬂush operation is shown

in Algorithm 1 (line 23-26). As it can be seen, the ﬂush

operation is intrinsically an energy-consuming operation

since it should set all of the values of Time ﬁeld in Tag

Mem entries to zero (line 24 in Algorithm 1). However, since

the ﬂush operation rarely performed during the execution

(considering 2 Bytes TBR Cntr register, the system performs

a ﬂush operation after each 65535 backup and rollback

operations), the energy overhead imposed to the system by

ﬂush operations is negligible. We will discuss the energy

penalty of ﬂush operations further in section 3.3. Finally,

Fig. 4 shows the state diagram for each entry of the

memory considering read, write, backup, rollback, and ﬂush

operations.

3.2 Case Study Example

To explore the efﬁciency of COACH, we consider a

case study scenario that includes a sequence of write,

read, successful check-points, and system failure in a

COACH-enabled system. Fig. 5 depicts the sequence of the

instructions (Fig. 5(a)) and the contents of different entries

of COACH’s storage elements (Fig. 5(b)-(l)) during the

execution of these instructions. Initially, all data are stored

in Mem[0] and TBR Cntr register, SR Cntr register, Loc and

Time for each entry of the memory are set to zero which

means all the data are in a safe state (Fig. 5(b)).

The ﬁrst instruction performs a write operation in A2

entry which is in the safe state. Thus, new data(D22)

is written to Mem[1]because Tag[A2]Loc points to Mem[0]

(LastChkptValue of data is in Mem[0]). Then, Tag[A2]Loc and

Tag[A2]Time are updated, and the state of A2 is changed

to modiﬁed (Fig. 5(c)). The next instruction is a write

operation to A1, which will be executed in the same

way as the ﬁrst instruction (Fig. 5(d)). After executing the

previous instructions, now another write operation should

be executed on A2 entry. Since A2 is in a modiﬁed state, new

data(D222) will be written to Mem[1]because Tag[A2]Loc is

pointing to it (Fig. 5(e)).

Assuming that the backup instruction depicted in

Fig. 5(a) has been executed successfully, TBR Cntr is

incremented by the value of SR Cntr (which is zero) plus

one, and as a result, the state of all entries are changed

to safe (Fig. 5(f)). After executing the backup instruction,

a request to read A1 should be satisﬁed. Since A1 is on a

safe state, Tag[A1]Loc points to Mem[1], so the data in Mem[1]

will be returned (Fig. 5(g)). Considering a successful backup

instruction, the sixth instruction performs a write on a data

entry in a safe state like the ﬁrst two instructions (Fig. 5(h)).

Instructions:

1. Write A2 D22

2. Write A1 D11

3. Write A2 D222

4. Backup

5. Read A1

6. Write A1 D111

7. Backup

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 0 0 D1

A2 1 1 D2 D22

A3 0 0 D3

TBR_Cntr 0

SR_Cntr 0

Write A2 D22

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 1 1 D1 D11

A2 1 1 D2 D22

A3 0 0 D3

TBR_Cntr 0

SR_Cntr 0

Write A1 D11

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 1 1 D1 D11

A2 1 1 D2

D222

A3 0 0 D3

TBR_Cntr 0

SR_Cntr 0

Write A2 D222

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 1 1 D11

A2 1 1

D222

A3 0 0 D3

TBR_Cntr 1

SR_Cntr 0

Successful Backup

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 1 1 D11

A2 1 1

D222

A3 0 0 D3

TBR_Cntr 1

SR_Cntr 0

Read A1 (D11)

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 0 2

D111

D11

A2 1 1

D222

A3 0 0 D3

TBR_Cntr 1

SR_Cntr 0

Write A1 D111

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 0 2

D111

D11

A2 1 1

D222

A3 0 0 D3

TBR_Cntr 1

SR_Cntr 1

System Failure

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 0 2

D111

D11

A2 1 1

D222

A3 0 0 D3

TBR_Cntr 1

SR_Cntr 1

Read A1 (D11)

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 0 0 D1

A2 0 0 D2

A3 0 0 D3

TBR_Cntr 0

SR_Cntr 0

Initial

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 0 3

D111

D11

A2 1 1

D222

A3 0 0 D3

TBR_Cntr 1

SR_Cntr 1

Write A1 D111

Addr

Tag

Mem

[0]

Mem

[1]

Loc

Time

A0 0 0 D0

A1 0 3

D111

A2 1 1

D222

A3 0 0 D3

TBR_Cntr 3

SR_Cntr 0

Successful Backup

Entry in Safe State

Entry in Modified State

Entry in Failed

State

Last

Value of Data

TempValue

of Data

FailedValue

of Data

Fig. 5. An example of COACH functionality and how it can avoid

inconsistency in case of system failure.

Now, assume that a system failure occurs after executing

sixth instruction, COACH restores memory content to

previous valid check-point only by incrementing SR Cntr

state (Fig. 5(i)). This time, executing the ﬁfth instruction after

recovery leads to a read from a failed entry, so LastChkptValue

of data in Mem[1] where Tag[A1]Loc does not point to it

will be returned (Fig. 5(j)). Now, the sixth instruction is a

write on a failed entry, and new data(D111) is written to

Mem[0] where Tag[A1]Loc points to it (Fig. 5(k)). Finally, by a

successful backup, the attribute of all data is changed to the

safe state (Fig. 5(l)).

3.3 Discussion on the COACH Overheads

In this subsection, we discuss the possible overheads of

COACH. To this end, we compare COACH with other state

of the art approaches, i.e., the approach presented by Xie

et al. [9] (as a deputy of the InCoAv approaches) and the

approach presented by Senni et al. [2] (as a deputy of the

InCoRe approaches) from the energy consumption and area

perspectives.

In the following, for the sake of brevity we only discuss

energy consumption of the main memory (NVM memory),

hence energy consumption of non-memory instructions or

the effects of cache memories on the energy consumption

of the system are not considered in the provided equations

because they are exactly the same in all of the approaches

(note that we have considered them in our experiments).

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 7

The energy consumption of read, write, and check-point

operations in [9] are calculated as follow :

Read EngInCoAv =Reads ×NVMRd Eng (1)

Write EngInCoAv =Writes ×NVMWr Eng (2)

Chkp EngInCoAv = (Chkps +LdSt Chkps)×Chkps Eng (3)

where Reads,NVMRd Eng,NVMWr Eng,Writes,Chkps,

Chkps Eng, and LdSt Chkps represent the number of reads

from NVM, energy per read access to NVM, energy per

write access to NVM, number of writes to NVM, number

of watchdog timer check-points, processor backup energy

consumption and number of load store pairs to one address,

respectively. While [9] does not affect read and write energy

consumptions of the system, it imposes a large number

of extra and unnecessary check-points (LdSt Chkps) which

increases the energy consumption of the system.

On the other hand, the energy consumption of read,

write, and check-point operations in [2] are calculated as

follow :

Read EngInCoRe =Reads ×NVMRd Eng (4)

Write EngInCoRe =Buf Hit ×(NVMWr Eng +Srch Buf)

+Buf Miss ×(2×NVMWr Eng +Srch Buf)(5)

Chkp EngInCoRe =Chkps ×Chkps Eng

+Buf Miss ×(2×NVMRd Eng +NVMWr Eng)

+Buf Overﬂow Chkps ×Chkps Eng (6)

where Reads,NVMRd Eng,NVMWr Eng,Buf Miss,Buf Hit,

Srch Buf,Chkps,Chkps Eng and Buf Overﬂow Chkps

represent number of reads from NVM, energy per read

access to NVM, energy per write access to NVM, number

of write access miss events in the address buffer, number

of write access hit events in the address buffer, energy

consumption of searching the address buffer, number of

watchdog timer check-points, processor backup energy

consumption and number of times the system forces a

check-point because the address buffer is full, respectively.

The InCoRe method [2] does not increase the energy

consumption of read accesses because it returns data from

the main memory. On the other hand, in each write access,

it needs to search the address buffer and if it could not

ﬁnd the data (Buf Miss), in addition to updating the data,

it needs to write address of the data to the address buffer

(one more NVM access is required in case of Buf Miss). The

proposed approach by Senni et al. [2] introduces additional

check-points in case that the address buffer becomes full

(Buf Overﬂow Chkps). Therefore, the size of the address

buffer is an important parameter which introduces a trade-

off between the number of additional check-points and the

cost of searching the address buffer. Furthermore, in each

check-point, the modiﬁed data (for each Buf Miss an entry

is added to the address buffer which needs to be backed up

in the next check-point) must be written back to the backup

memory which leads to high backup cost in this approach

(for each modiﬁed data, three NVM accesses is required

because the controller needs to read the address from the

address buffer, read corresponding data from main memory

and write it to backup memory).

TABLE 1

Overhead of FRAM area on total price of a micro controller.

Part FRAM

(KB)

SRAM

(KB)

Price

(US$)

Price

Overhead (%)

MSP430FR5731 4 1 1.20 -

MSP430FR5735 8 1 1.26 5.0

MSP430FR5739 16 1 1.38 9.5

MSP430FR6977 64 2 3.20 -

MSP430FR6979 128 2 3.35 4.7

MSP430FR5962 128 8 2.85 -

MSP430FR5964 256 8 3.25 14.1

Finally, the energy consumption of read, write, and

check-point operations in COACH are calculated as follows:

Read EngCOACH =Reads ×(3×NVMRd Eng)(7)

Write EngCOACH =Writes ×(2×NVMWr Eng

+NVMRd Eng)(8)

Chkp EngCOACH =Chkps ×Chkps Eng +Flush Eng (9)

where Reads,NVMRd Eng,NVMWr Eng,Writes,Chkps,

Chkps Eng and Flush Eng represent number of reads from

NVM, energy per read access to NVM, energy per write

access to NVM, number of writes to NVM, number of

watchdog timer check-points, processor backup energy

consumption and energy consumption of ﬂushing Tag Mem,

respectively. For a read operation, a data entry is fetched

from each memory (Mem[0] and Mem[1]) and the Tag Mem,

hence for each read operations we need three accesses to

NVM memory. In each write operation, COACH reads Tag

Mem, writes new data to the target memory and updates Tag

Mem, as a result, the write operation requires three accesses

to NVM memory.

According to the above discussion and the formulation,

we can conclude that InCoAv [9] does not increase the

energy consumption of read and write accesses to the

NVM memory, however, the experiments show that most

of the benchmarks, the number of check-points due to load-

store pairs are high which decrease the energy efﬁciency

and forward progress of the system. InCoRe [2] needs to

search address buffer in each write access; therefore, it has

poor performance in the write intensive benchmarks. The

results show that COACH improves forward progress in

almost all the benchmarks, except the benchmarks which

the number of loads is signiﬁcantly higher than stores. This

happens because in COACH, a data needs to be fetched

from Mem[0], Mem[1], and Tag Mem which increases the

energy consumption of read operation compared to two

other approaches (see section 4).

It is noteworthy to mention that COACH does not

increase the memory access latency during the normal

execution. In other words, during a read operation, data of

Mem[0], Mem[1],Tag Mem,TBR Cntr register, and SR Cntr

the Tag Mem, data in Mem[0] or Mem[1] will be returned to

the processor. Therefore, the latency of a read operation is

almost equal to the latency of a single NVM read access.

The peripheral circuits for selecting the suitable memory

for reading data have negligible latency compared to NVM

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 8

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2

Ppv (mW)

Vpv (V)

Irr = 71 W/M2 Irr = 175 W/M2 Irr = 375 W/M2

Fig. 6. Power-Voltage curve of energy harvester model proposed by

Wang et al. [8]. The curve shows that the system achieves the maximum

power when the bulk capacitor voltage is between 2.5v and 3v.

latency (see Table 2). On the other hand, during a write

operation, the write controller should ﬁrst read the Tag

Mem,TBR Cntr register, and SR Cntr register concurrently

and based on the state of the data, it should select a

suitable memory for the write operation. Accordingly,

write operation takes two clock cycles and imposes one

clock cycle performance overhead to the system. It should

be mentioned that this overhead could be resolved by

exploiting a couple of buffers that keep the address and

data of the write request and lets the processor resumes

the execution, while the memory controller updates the

corresponding memory ﬁelds.

From the check-point operation perspective, COACH

does not impose any additional check-point to the system,

and the energy consumption for each backup or rollback

is equal to updating TBR Cntr, and SR Cntr registers in

addition to the ﬂush cost. Although ﬂush is an energy

consuming operation (since it resets all Time entries in

Tag Mem), it occurs very rarely (especially when the size

of the time ﬁeld in Tag Mem,TBR Cntr and SR Cntr

registers are relatively large). Assuming a system containing

a 64 KB (addressable) of NVM memory in which, each

word is four Bytes, the Tag Mem would contain 16 K(16384)

entries. Thus, if we consider that each entry of time ﬁeld

occupies 2 Bytes, then each ﬂush operation requires to reset

32 KB of memory (size of Tag Mem). On the other hand, since

after 65536 backup or rollback operations, the system needs

to be ﬂushed, amortized cost (energy and latency) of ﬂush

is 16 Bytes per each backup or rollback. The amortized cost

(energy and latency) of ﬂush operation could be reduced

by increasing the size of the Time ﬁeld. For example, if we

consider three Bytes (four Bytes) space for each Time entry,

the amortized ﬂush cost (energy and latency) for every 1000

(200000) check-points would be only three Bytes.

COACH exploits two memories (Mem[0], Mem[1]), a Tag

Mem and a control unit which requires more area than the

InCoRe [2] (two memories and one address buffer) and

the InCoAv [9] (no area overhead). However, in the energy

harvesting systems, energy plays an important role in the

system, and it directly inﬂuences the forward progress of

the system. Furthermore, it is important to mention that

the two memories and the Tag Mem are all NVM with

almost zero leakage power consumption [23], [24]. The

control unit is a simple circuit with low energy and latency

overhead (latency, leakage power, and dynamic energy of

the control unit are presented in Table 2). Therefore the

additional circuits of COACH do not increase the energy

consumption of the system noticeably. A brief comparison

among the existing MSP430 microcontrollers, which beneﬁt

from FRAM technology, has been presented in Table 1.

According to Table 1, by duplicating the size of FRAM

TABLE 2

Setup conﬁgurations used in simulations.

Simulation conﬁgurations*

Processor Energy Per Instruction (No Memory Access) 0.16 nJ

FRAM Read Access Energy 0.22 nJ

FRAM Write Access Energy 0.34 nJ

SRAM (Cache) Access Energy 0.05 nJ

Backup Energy (Processor) 19.50 nJ

Restore Energy (Processor) 16.90 nJ

Processor Leakage Power 675 µW

Platform Power Consumption (without processor) 450 µW

FRAM Access Latency 125 ns

SRAM (Cache) Access Latency 1 Cycle

Backup/Restore Latency (Processor) 400 ns

Control Unit Access Latency 0.98 ns

Control Unit Leakage Power 10.09 µW

Control Unit Dynamic Energy 0.63 pJ

Control Unit Area 3628 µm2

Processor Frequency 8, 16, 24 MHz

Bulk Capacitor Size 10, 50, 100, 150 µF

Stable Solar Irradiance (see Fig. 2b) 71, 175, 375 W/m2

Unstable Solar Irradiance (see Fig. 2a) 0-380 W/m2

Simulation Time (for each case) 3.5 s

*Energy and latency of backup, restore, access to FRAM and

SRAM were adopted from MSP430FR5969 microcontroller by Texas

Instrument®[12] and proposed microcontroller by Khanna et al.

[23]. The power consumption of the energy harvesting platform was

adopted from Wang et al. [8]. The energy and latency characteristics

of the control unit were estimated using Synopsys Design Compiler

[25] using 90 nm technology. The stable and unstable solar irradiance

traces were taken from Gorlatova et al. [18].

memory, there will be only 5%-15% of increment in the

overall costs of manufacturing these microcontrollers, which

is tolerable.

4 EXPERIMENTS

In this section, we will explore the effectiveness of COACH

in handling the consistency challenge for nonvolatile

processors used in energy harvesting systems. To this end,

ﬁrst we will introduce the system setup which has been

used during our experiments in the following subsection.

Then, we will discuss the experiment results in the second

subsection.

4.1 Experimental Setup

Since COACH is introduced for the energy harvesting

systems, the ﬁrst step is selecting the input energy source

for the system. Accordingly, we consider light as the input

energy source as it is popular and available in in-door and

out-door locations [18]. Furthermore, the light power trace

could be stable (for stationary devices), unstable (for mobile

devices), strong (in the morning, and afternoon) or weak

(in the evening). Since the light irradiance (the power of

input source) is a function of time and location, for the sake

of diversity in our explorations, we consider three stable

power traces with low (71 W/m2), medium(175 W/m2)

and high (375 W/m2) irradiances for stationary devices,

in addition to an unstable power trace (0-380 W/m2) for

mobile devices (unstable input trace is depicted in Fig. 2a,

and a sample of stable power trace is depicted in Fig. 2b).

To mimic a proper behavior of the energy absorption in

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 9

100

1000

10000

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

InCoAv

InCoRe

COACH

dijkstra ndes ludcmp10 bsort fft edn adpcm Patricia bitcount StringSearch corner edge matmult10 CRC32

Energy (pJ)

# of Insts (106)

Non-Memory EPI Load-Store EPI Backup-Rollback EPI Total EPI

Fig. 7. The results for comparing energy consumption and forward-progress of COACH, InCoAv [9] and InCoRe [2] with the unstable power trace

(see Fig. 2a), 24 MHz processor frequency, and 50 µFbulk capacitor.

energy harvesting systems, we consider the converter-less

photovoltaic power system proposed by Wang et al. [8]

with two photovoltaic cells (4.5 ×5.5 cm2) to absorb energy

from ambient sources and store it to a relatively small bulk

capacitor (10 µFto 150 µF).

The inconsistency is an important challenge for all

nonvolatile memories that have been used in check-

pointing methods, however, in the experiments we consider

nonvolatile processors with Ferroelectric-CMOS technology

as a case study since there are a number of state of

the art studies that fabricate nonvolatile processors using

Ferroelectric-CMOS technology [22], [23], [26]. Furthermore,

MSP430FR microcontrollers by Texas Instruments [12]

are commercial off the shelf NVPs with on-chip FRAM

which enabled us to adapt energy consumption of

instructions, FRAM access, and SRAM access from a real

energy harvesting platform for the experiments (Table 2).

We considered a nonvolatile possessor equipped with

64 KB of FRAM and 0.5 KB of SRAM cache running at

8, 16, and 24 MHz. The considered conﬁgurations for

COACH-enabled nonvolatile processors is in line with the

MSP430FR microcontroller by Texas Instruments®[12] and

the microcontroller proposed by Khanna et al. [23]. We

estimated the area, latency, dynamic energy, and leakage

power of the control unit using Synopsys Design Compiler

[25] using 90 nm technology.

We have exploited the harvester model proposed

by Wang et al. [8]. The P-V curve of this model is

depicted in Fig. 6. Based on this chart, the output

power of the harvester is determined by the input

irradiance, and voltage of the bulk capacitor. Furthermore,

we integrated the power consumption of the energy

harvesting platform, the latency and energy consumption

of instruction executions, FRAM accesses, SRAM accesses,

and leakage dissipation of MSP430FR5969 microcontroller

to the SimpleScalar simulator [27] (see Table 2). Therefore,

for each instruction, the simulator calculates the harvested

energy, and consumed energy to update the energy (voltage)

of the bulk capacitor.

We have compared the results of COACH with two

states of the art approaches introduced in Section 2 (InCoAv

[9] and InCoRe [2]). Fourteen different benchmarks (Patricia,

StringSearch, BitCount, Dijkstra, MatMult, NDES, LUDCMP,

Bubble Sort, FFT, EDN, ADPCM, Corner(Susan), Edge(Susan)

and CRC) from MiBench and Malardalen benchmark suites

are exploited for the evaluations. In line with the previous

studies [16], [20], we have considered a watchdog counter

to raise a backup signal after executing 50000 instructions

as the check-pointing policy. The details of the simulation

setup are presented in Table 2.

4.2 Results

The energy consumption and the forward-progress of an

application running on a typical energy harvesting system

have a close relationship with each other. Indeed, because

the input energy in these systems is limited and unstable,

the system will be inactive most of the time, and it will be

waiting for the bulk capacitor to be charged. Therefore, the

inefﬁcient use of the stored energy in the bulk capacitor will

slow down the forward-progress of the system.

Fig. 7 shows the energy consumption and forward-

progress of different benchmarks for InCoAv [9], InCoRe

[2] and COACH, under unstable power trace (see Fig. 2a)

with 24 MHz processor frequency and 50 µFbulk

capacitor. Since accessing memories during the execution of

applications plays the main role in the energy consumption

of the energy harvesting system, we classify the instructions

based on their characteristics in accessing memories. To

this end, we will explore the contributions of instructions

in three classes, i.e., non-memory instructions, load-store

instructions, and backups-rollback instructions. We will use

the Energy Per Instructions (EPI) metric for this purpose.

In Fig. 7, the bars show the average of validly executed

instructions, and the black line with star marks shows

Total EPI. Besides evaluating Total EPI, we further evaluate

and illustrate the contribution of Non-Memory EPI (red

line with diamond marks), Load-Store EPI (blue line with

plus marks), and Backup-Rollback EPI (green line with circle

marks), separately to provide more understanding on the

contributions of different classes of instructions (introduced

above) on Total EPI of an energy harvesting system.

1) Backup-Rollback EPI: As it has been depicted in

Fig. 7, Backup-Rollback EPI in InCoAv [9] is responsible for

a large portion of Total EPI while the impact of Backup-

Rollback on Total EPI of InCoRe is much lower. Furthermore,

the effect of Backup-Rollback EPI in COACH is ideal since

COACH beneﬁts from fast and low cost backup and

rollback operations. It is noteworthy to mention that for the

applications such as StringSearch, Corner, and Edge with a

high number of write accesses to different locations, InCoRe

imposes relatively high backup overhead to the system,

since the address buffer (mentioned in section 2) ﬁlls up

frequently and imposes many additional backup operations

to the system.

2) Load-Store EPI: From the Load-Store instructions

perspective, InCoAv [9] has ideal energy consumption since

it does not affect the normal read and write operations

of memory. On the other hand, in InCoRe, a signiﬁcant

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 10

100

0.0

2.5

5.0

7.5

10.0

100

150

100

150

100

150

100

150

100

150

100

150

100

150

100

150

100

150

InCoAv InCoRe COACH InCoAv InCoRe COACH InCoAv InCoRe COACH

FIX2pm.txt FIX3pm.txt FIX5pm.txt

Time(%)

# of Insts (106)

At 02:00 PM, Irr = 375 W/M2At 09:00 AM, Irr = 175 W/M2At 05:00 PM, Irr = 71 W/M2

Cap

(μF)

Inactive (Charging Capacitor) Valid Instruction Failed Instruction Backup-Rollback

(a) Capacitor capacity effects (24 MHz frequency)

100

0.0

2.5

5.0

7.5

10.0

InCoAv InCoRe COACH InCoAv InCoRe COACH InCoAv InCoRe COACH

FIX2pm.txt FIX3pm.txt FIX5pm.txt

Time (%)

# of Insts (106)

At 05:00 PM, Irr = 71 W/M2

At 09:00 AM, Irr = 175 W/M2

At 02:00 PM, Irr = 375 W/M2

Freq

(MHz)

At 05:00 PM, Irr = 71 W/M2

At 09:00 AM, Irr = 175 W/M2

At 02:00 PM, Irr = 375 W/M2At 05:00 PM, Irr = 71 W/M2

At 09:00 AM, Irr = 175 W/M2

At 02:00 PM, Irr = 375 W/M2At 05:00 PM, Irr = 71 W/M2

At 09:00 AM, Irr = 175 W/M2

At 02:00 PM, Irr = 375 W/M2

Inactive (Charging Capacitor) Valid Instruction Failed Instruction Backup-Rollback

(b) Processor frequency effets (50 µFcapacitor)

Fig. 8. The effects of irradiance strength, processor frequency and capacity of the bulk capacitor in forward-progress of energy harvesting systems.

portion of the energy is consumed due to store operations

since it requires to search the entire address buffer (64

words in the experiment) for each write access. Therefore,

InCoRe consumes a considerable amount of energy in

the applications with a high number of write operations,

e.g., StringSearch, Corner, and Edge, which leads to slower

forward-progress.

3) Non-Memory EPI: Considering Non-Memory

instructions, Fig. 7 illustrates that none of the three

approaches affect the non-memory instructions energy

consumption. Therefore, Non-Memory EPIs are relatively

the same for all approaches.

According to the above discussion and the results

of Fig. 7, we can conclude that the COACH has lower

EPI than InCoAv [9] and InCoRe [2] for almost all the

benchmarks in our experiments, while it resolves the

inconsistency problem. According to our observations, CRC

and Matmult were the only benchmarks where COACH may

impose more EPI or forward-progress penalties (compare

to InCoAv and InCoRe) to the system. This is mainly due

to increment of energy consumption of read operations (3

FRAM accesses) in COACH while other approaches do not

affect the energy of read accesses. Thus, in applications

like CRC and Matmult, which the number of loads are

signiﬁcantly higher than stores, COACH may impose EPI or

forward-progress overheads to the system. The results show

that on average, COACH, improves forward progress by

48% and 60% compared to InCoRe and InCoAv approaches

respectively.

The forward-progress in each energy harvesting system

is not only affected by the efﬁciency of the EPI metric in that

system, but it is also affected by the strength of the input

power, the capacity of the system’s bulk capacitor (as the

system energy provider), and the operational frequency of

the system. Fig. 8 depicts the effects of capacitor’s size and

frequency on the forward-progress of a system with low (at

5 PM, 71 W/m2), medium (at 9 AM, 175 W/m2) and high (at

2 PM, 375 W/m2) irradiances for a stationary device with a

stable input power (see Fig. 2b). For the sake of brevity, only

the average results of the benchmarks are presented. Here,

the bars show the total validly executed instructions and the

lines show the portion of time that the system spent on: 1)

charging the bulk capacitor (blue line with circle marks), 2)

executing valid instructions (green line with star marks), 3)

executing failed instructions (red line with cross marks), and

4) executing backup-rollback instructions (black line with

diamond marks).

Considering Fig. 8a, exploiting large capacitor could

reduce the number of failed instructions, but on the other

hand, it has some disadvantage like occupying a large area

and imposing safety concerns (for wearable devices). In

addition, charging a large capacitor requires longer time,

especially when the input power is weak (i.e., after 5 PM

in out-door situations). The results in Fig. 8a shows that

by increasing the capacitor size from 10 µFto 50 µF,

number of failed instructions will be reduced signiﬁcantly

and increasing the size of the capacitor beyond 50 µFwould

have a negligible impact on the forward-progress. Hence,

for COACH, exploiting 50 µF, 100 µF, and 150 µFwill

improve the forward-progress by 36%, 43%, and 31%

compared to 10 µFcapacitor, respectively.

Fig. 8b illustrates the effects of frequency changes in

an energy harvesting system equipped with 50 µFbulk

capacitor. As it was mentioned earlier, in an energy

harvesting system, the input energy is limited, and the

system is in an inactive state (waiting for the bulk

capacitor to be charged) most of the time. Therefore,

increasing the frequency of an energy harvesting system

would not improve the forward-progress of the system

as it does for typical battery powered embedded systems.

Indeed, by increasing the frequency of the processor,

while instructions will be executed faster and the rate of

the energy consumption in the system will be increased,

charging rate of the bulk capacitor remains the same.

Furthermore, considering the FRAM technology, the

memory operates at 8 MHz frequency. Therefore, increasing

the frequency of the processor will not increase the speed

of memory access or backup (rollback) instructions. In

this regard, according to Fig. 8b in InCoRe and COACH,

increasing the frequency to 24 MHz improves the forward-

progress by up to 44%, but in InCoAv the forward-progress

would be improved up to 21% since InCoAv frequently

stops the execution for taking check-points.

5 RE LATED WORK

The previous studies which have targeted the correct

functionality of the systems with unstable ambient power

could be classiﬁed into three groups. The studies in the

ﬁrst group recover the state of the system from previously

backed up information at FLASH memory in case of

system failure [4], [28]. Although these studies exploit

different techniques to reduce the number of system

failures, they suffer from high failure penalties since

backup and recovery operations from FLASH memory

impose noticeable overheads in terms of latency and power

consumption [1].

The second group of studies exploits hardware-based or

software-based techniques to prevent system failure in case

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 11

of power failure. Therefore, these studies do not consider

any recovery mechanisms in their solutions. Considering

the software-based approaches [6], [7], [15], [17], a task is

partitioned to one or more atomic sections. Then, in the

entry of each atomic section, the system checks the energy of

the bulk capacitor and only executes the next atomic section

if the capacitor has enough energy, otherwise the system

stalls until the capacitor receives enough charge. Unlike the

software-based approaches, in hardware-based approaches

[3], [29], [30] system starts executing the tasks until it detects

a failure in the input power. Then, the system begins to

back up the dirty blocks of SRAM memories (i.e., cache or

scratchpad memory) in the nonvolatile memory.

There are some challenges that both of the hardware

and software approaches should consider them, e.g., self-

discharging of the bulk capacitor [8], [31], measuring the

state of the charge of the bulk capacitor [32], [33], input

power ﬂuctuations. In the hardware-based approaches,

since the state of the processor is unknown when the input

power fails, the number of blocks that need to be backed up

may be too small or too large (in the worst case situation,

the system needs to backup all of the SRAM memory

[34]). Accordingly, before a system failure, we cannot

accurately estimate the amount of the required charge in

the bulk capacitor for the backup operations. Likewise,

for the software-based approaches estimating a safe worst

case energy consumption of a task (or section) requires

conservative approaches [35], [36] and precise knowledge

about the energy consumption of the platform which could

be complicated. To this end, it seems that an efﬁcient

recovery mechanism should be considered alongside the

proposed approaches in the second group.

The studies in the third group propose a rollback

operation to the last successful check-point when the system

fails without accessing FLASH memory or starting the

application from the beginning. In this regard, Hibernus++

[5] adaptively set threshold voltage to minimize the number

of check-pointing for capacitor-less energy harvesting

systems. These studies rollback the system to the last

successful check-point in case of system failure, but they

did not consider the data inconsistency problem. DINO

[17], ALPACA [15], and CHAIN [7] proposed programming

models and suggested breaking the program to set of

atomic tasks with the aid of programmers. To deal with

the data inconsistency, the system copies the task’s data

to a buffer and runs the task. The system backs up the

buffer’s data to the main memory if it completes the task

without power interrupts; otherwise, the system discards

the buffer. The main drawback of these approaches is that

the programmers need to have a precise knowledge of the

energy consumption and the capacitor size of the platform

which is a non-trivial assumption and cause poor re-

usability of the software. Unlikely, in the Chinchilla [16], the

programmers divide the program to code blocks which none

of them exceeds the capacitor energy. Chinchilla exploits

an operating system level approach to insert check-points

adaptively between blocks to minimize the number of

check-points while providing forward-progress. To ensure

data consistency, Chinchilla uses a log ﬁle called undo Log,

which logs the address and value of the modiﬁed data after

the last successful check-point. However, running operating

system is not feasible for many ultra-low-power systems.

Senni et al. [2] approach exploits a memory architecture

which has duplicated nonvolatile memory as the main

memory and backup memory along with an address buffer.

During the normal execution of the program, backup

memory is off, and the buffer keeps track of the data which

their value has been changed. When the backup signal is

raised, dirty data in the main memory will be copied to

the backup memory. Thus, in case of soft error or power

failure, the main memory data will be recovered from

backup memory. Van Der Woude et al. [13], Xie et al. [9],

[14], propose a consistency-aware check-pointing scheme

which ﬁnds the same address referenced load and store

pairs and put a check-point before the store instruction

in each founded pair. While their approach does not have

any area overhead, it considerably increases the number of

backups in the program. Although the proposed approaches

in the third group have better energy consumption than

the proposed approaches in the ﬁrst group, they waste

a considerable amount of energy to guarantee the correct

functionality (as discussed in section 2).

Comparing with previous studies mentioned above, in

this study, we proposed COACH, an energy-efﬁcient and

consistency-aware memory scheme for energy harvesting

nonvolatile processor. COACH beneﬁts from fast and low

energy backup and rollback operations. COACH is a

full hardware approach; thus, it does not require any

programmer or operating system interactions. COACH

works with all check-pointing mechanism, and it does not

force any additional check-point to the system.

6 CONCLUSION

In this paper, we proposed COACH, an energy-efﬁcient

and consistency-aware memory scheme which guarantees

correct functionality and consistency of the program in

an energy harvesting system. COACH beneﬁts from fast

backup and rollback operations which impose low energy

overhead to the design. Unlike the previous consistency-

aware approaches, COACH does not charge any additional

check-points to the program. The results show that COACH

can guarantee consistency of the program and improves the

forward-progress of the system compared to the state of the

art approaches proposed by Xie et al. [9] and Senni et al. [2]

by 60% and 48%, respectively.

REFERENCES

[1] Y. Liu, Z. Li, H. Li, Y. Wang, X. Li, K. Ma, S. Li, M.-F. Chang, S. John,

Y. Xie et al., “Ambient energy harvesting nonvolatile processors:

from circuit to system,” in Proceedings of the 52nd Annual Design

Automation Conference (DAC). ACM, 2015, p. 150.

[2] S. Senni, L. Torres, G. Sassatelli, and A. Gamatie, “Non-volatile

processor based on mram for ultra-low-power iot devices,” ACM

Journal on Emerging Technologies in Computing Systems (JETC),

vol. 13, no. 2, p. 17, 2017.

[3] M. Xie, M. Zhao, C. Pan, H. Li, Y. Liu, Y. Zhang, C. J.

Xue, and J. Hu, “Checkpoint aware hybrid cache architecture

for nv processor in energy harvesting powered systems,” in

Proceedings of the Eleventh IEEE/ACM/IFIP International Conference

on Hardware/Software Codesign and System Synthesis (CODES/ISSS).

ACM, 2016, p. 22.

[4] B. Ransford, J. Sorber, and K. Fu, “Mementos: System support

for long-running computation on rﬁd-scale devices,” Acm Sigplan

Notices, vol. 47, no. 4, pp. 159–170, 2012.

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 12

[5] D. Balsamo, A. S. Weddell, A. Das, A. R. Arreola, D. Brunelli,

B. M. Al-Hashimi, G. V. Merrett, and L. Benini, “Hibernus++:

a self-calibrating and adaptive system for transiently-powered

embedded devices,” IEEE Transactions on Computer-Aided Design

of Integrated Circuits and Systems (TCAD), vol. 35, no. 12, pp. 1968–

1980, 2016.

[6] H. Jayakumar, A. Raha, J. R. Stevens, and V. Raghunathan,

“Energy-aware memory mapping for hybrid fram-sram mcus

in intermittently-powered iot devices,” ACM Transactions on

Embedded Computing Systems (TECS), vol. 16, no. 3, p. 65, 2017.

[7] A. Colin and B. Lucia, “Chain: tasks and channels for reliable

intermittent programs,” in ACM SIGPLAN Notices, vol. 51, no. 10.

ACM, 2016, pp. 514–530.

[8] Y. Wang, Y. Liu, C. Wang, Z. Li, X. Sheng, H. G. Lee, N. Chang,

and H. Yang, “Storage-less and converter-less photovoltaic energy

harvesting with maximum power point tracking for internet of

things,” IEEE Transactions on Computer-Aided Design of Integrated

Circuits and Systems (TCAD), vol. 35, no. 2, pp. 173–186, 2016.

[9] M. Xie, M. Zhao, C. Pan, J. Hu, Y. Liu, and C. J. Xue, “Fixing

the broken time machine: Consistency-aware check-pointing for

energy harvesting powered non-volatile processor,” in Proceedings

of the 52nd Annual Design Automation Conference (DAC). ACM,

2015, p. 184.

[10] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, “Scalable high

performance main memory dystem using phase-change memory

technology,” ACM SIGARCH Computer Architecture News, vol. 37,

no. 3, pp. 24–33, 2009.

[11] J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha,

“Data allocation optimization for hybrid scratch pad memory with

SRAM and nonvolatile memory,” IEEE Transactions on Very Large

Scale Integration (TVLSI) Systems, vol. 21, no. 6, pp. 1094–1102, 2013.

[12] Texas Instruments, “MSP430FRxx Microcontrollers,”

http://www.ti.com, 2019.

[13] J. Van Der Woude and M. Hicks, “Intermittent computation

without hardware support or programmer intervention,” in

12th {USENIX}Symposium on Operating Systems Design and

Implementation ({OSDI}), 2016, pp. 17–32.

[14] M. Xie, C. Pan, M. Zhao, Y. Liu, C. J. Xue, and J. Hu, “Avoiding

data inconsistency in energy harvesting powered embedded

systems,” ACM Transactions on Design Automation of Electronic

Systems (TODAES), vol. 23, no. 3, p. 38, 2018.

[15] K. Maeng, A. Colin, and B. Lucia, “Alpaca: intermittent execution

without checkpoints,” Proceedings of the ACM on Programming

Languages, vol. 1, no. OOPSLA, p. 96, 2017.

[16] K. Maeng and B. Lucia, “Adaptive dynamic checkpointing for safe

efﬁcient intermittent computing,” in 13th {USENIX}Symposium on

Operating Systems Design and Implementation ({OSDI}), 2018, pp.

129–144.

[17] B. Lucia and B. Ransford, “A simpler, safer programming

and execution model for intermittent systems,” ACM SIGPLAN

Notices, vol. 50, no. 6, pp. 575–585, 2015.

[18] M. Gorlatova, A. Wallwater, and G. Zussman, “Networking low-

power energy harvesting devices: Measurements and algorithms,”

IEEE Transactions on Mobile Computing (TMC), vol. 12, no. 9, pp.

1853–1865, 2013.

[19] K. Ma, Y. Zheng, S. Li, K. Swaminathan, X. Li, Y. Liu, J. Sampson,

Y. Xie, and V. Narayanan, “Architecture exploration for ambient

energy harvesting nonvolatile processors,” in 21st International

Symposium on High Performance Computer Architecture (HPCA).

IEEE, 2015, pp. 526–537.

[20] Q. Liu and C. Jung, “Lightweight hardware support for

transparent consistency-aware check-pointting in intermittent

energy harvesting systems,” in 5th Non-Volatile Memory Systems

and Applications Symposium (NVMSA). IEEE, 2016, pp. 1–6.

[21] W. Song, Y. Zhou, M. Zhao, L. Ju, C. J. Xue, and Z. Jia,

“Emc: Energy-aware morphable cache design for non-volatile

processors,” IEEE Transactions on Computers, vol. 68, no. 4, pp. 498–

509, 2018.

[22] Y. Liu, F. Su, Y. Yang, Z. Wang, Y. Wang, Z. Li, X. Li, R. Yoshimura,

T. Naiki, T. Tsuwa et al., “A 130-nm ferroelectric nonvolatile

system-on-chip with direct peripheral restore architecture for

transient computing system,” IEEE Journal of Solid-State Circuits,

vol. 54, no. 3, pp. 885–895, 2019.

[23] S. Khanna, S. C. Bartling, M. Clinton, S. Summerfelt, J. A.

Rodriguez, and H. P. McAdams, “An fram-based nonvolatile

logic mcu soc exhibiting 100% digital state retention at vdd=0v

achieving zero leakage with <400-ns wakeup time for ulp

applications,” IEEE Journal of Solid-State Circuits, vol. 49, no. 1, pp.

95–106, 2014.

[24] S. C. Bartling, S. Khanna, M. P. Clinton, S. R. Summerfelt, J. A.

Rodriguez, and H. P. McAdams, “An 8mhz 75µa/mhz zero-

leakage non-volatile logic-based cortex-m0 mcu soc exhibiting

100% digital state retention at vdd=0v with <400ns wakeup

and sleep transitions,” in IEEE International Solid-State Circuits

Conference Digest of Technical Papers, Feb 2013, pp. 432–433.

[25] Synopsys Inc., “Design Compiler: RTL Modeling User Guide,”

http://www.synopsys.com, 2019.

[26] F. Su, Y. Liu, Y. Wang, and H. Yang, “A ferroelectric nonvolatile

processor with 46 µs system-level wake-up time and 14 µs sleep

time for energy harvesting applications,” IEEE Transactions on

Circuits and Systems I: Regular Papers, vol. 64, no. 3, pp. 596–607,

2016.

[27] D. Burger and T. M. Austin, “The simplescalar tool set, version

2.0,” ACM SIGARCH computer architecture news, vol. 25, no. 3, pp.

13–25, 1997.

[28] H. Li, Y. Liu, Q. Zhao, Y. Gu, X. Sheng, G. Sun, C. Zhang, M.-F.

Chang, R. Luo, and H. Yang, “An energy efﬁcient backup scheme

with low inrush current for nonvolatile sram in energy harvesting

sensor nodes,” in Proceedings of Design, Automation & Test in Europe

Conference & Exhibition (DATE). EDA Consortium, 2015, pp. 7–12.

[29] Y. Liu, J. Yue, H. Li, Q. Zhao, M. Zhao, C. J. Xue, G. Sun, M.-F.

Chang, and H. Yang, “Data backup optimization for nonvolatile

sram in energy harvesting sensor nodes,” IEEE Transactions on

Computer-Aided Design of Integrated Circuits and Systems (TCAD),

vol. 36, no. 10, pp. 1660–1673, 2017.

[30] H. Jayakumar, A. Raha, W. S. Lee, and V. Raghunathan,

“Quickrecall: A hw/sw approach for computing across power

cycles in transiently powered computers,” ACM Journal on

Emerging Technologies in Computing Systems (JETC), vol. 12, no. 1,

p. 8, 2015.

[31] K. Ma, X. Li, H. Liu, X. Sheng, Y. Wang, K. Swaminathan, Y. Liu,

Y. Xie, J. Sampson, and V. Narayanan, “Dynamic power and

energy management for energy harvesting nonvolatile processor

systems,” ACM Transactions on Embedded Computing Systems

(TECS), vol. 16, no. 4, p. 107, 2017.

[32] S. Ulukus, A. Yener, E. Erkip, O. Simeone, M. Zorzi, P. Grover,

and K. Huang, “Energy harvesting wireless communications: A

review of recent advances,” IEEE Journal on Selected Areas in

Communications, vol. 33, no. 3, pp. 360–381, 2015.

[33] N. Michelusi, L. Badia, and M. Zorzi, “Optimal transmission

policies for energy harvesting devices with limited state-of-charge

knowledge,” IEEE Transactions on Communications, vol. 62, no. 11,

pp. 3969–3982, 2014.

[34] K. Ma, M. J. Liao, X. Li, Z. Huan, and J. Sampson, “Evaluating

tradeoffs in granularity and overheads in supporting nonvolatile

execution semantics,” in 18th International Symposium on Quality

Electronic Design (ISQED). IEEE, 2017, pp. 39–44.

[35] P. W¨

agemann, T. Distler, T. H¨

onig, H. Janker, R. Kapitza, and

W. Schr¨

oder-Preikschat, “Worst-case energy consumption analysis

for energy-constrained embedded systems,” in 27th Euromicro

Conference on Real-Time Systems (ECRTS). IEEE, 2015, pp. 105–

114.

[36] J. Morse, S. Kerrison, and K. Eder, “On the limitations of analyzing

worst-case dynamic energy of processing,” ACM Transactions on

Embedded Computing Systems (TECS), vol. 17, no. 3, p. 59, 2018.

Ali Hoseinghorban received his B.Sc. in

computer engineering from Shahid Beheshti

University and his M.Sc. from Sharif University

of Technology in 2015 and 2017, respectively.

He is currently a Ph.D. student in the Computer

Engineering department of Sharif University

of Technology. His research interests include

memory management in energy harvesting

system, investigating the challenges of emerging

nonvolatile memories, and low-power real-time

embedded systems.

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, MONTH YEAR 13

Amir Mahdi Hosseini Monazzah received his

Ph.D degree in computer engineering from the

Sharif University of Technology, Tehran, Iran, in

2017. He was a member of the Dependable

Systems Laboratory from 2010 to 2017. As a

Visiting Researcher, he was with the Embedded

Systems Laboratory, University of California,

Irvine, CA, USA from 2016 to 2017. As a postdoc

fellow he was with the school of computer

science, institute for research in fundamental

sciences (IPM), Tehran, Iran from 2017 to

2019. He is currently a faculty member of the School of Computer

Engineering, Iran University of Science and Technology (IUST), Tehran,

Iran. His research interests include investigating the challenges of

emerging nonvolatile memories, hybrid memory hierarchy design, and

IoT applications.

Mostafa Bazzaz received his B.Sc. in computer

engineering from Amirkabir University of

Technology and his M.Sc. from Sharif University

of Technology in 2009 and 2011, respectively.

He is currently a Ph.D. candidate in the

Computer Engineering department of Sharif

University of Technology. His research interests

include low-power multi-core embedded

systems, nonvolatile memories, and real-time

embedded systems.

Bardia Safaei (IEEE Student Member) received

his B.Sc. degree in computer engineering (CE)

from KNTU, Tehran, Iran, in 2014 and the M.Sc

degree from Sharif University of technology

(SUT), Tehran, Iran, in 2016 respectively.

Currently he is pursuing the Ph.D. degree in

CE at SUT. He is also conducting research as

a visiting PhD student at Chair for Embedded

Systems at Karlsruhe Institute of Technology

(KIT), Germany. He is honored to be selected

as a member of national elites foundation since

2016. His research interests include power efﬁciency and dependability

challenges in Internet of Things (IoT).

Alireza Ejlali received the Ph.D. degree in

computer engineering from the Sharif University

of Technology, Tehran, Iran, in 2006. He is an

Associate Professor of computer engineering

with the Sharif University of Technology. He is

currently the Director of the Embedded Systems

Research Laboratory, Department of Computer

Engineering, Sharif University of Technology. His

current research interests include low power

design, real-time embedded systems, and fault-

tolerant embedded systems.

iCheck: Progressive Checkpointing for Intermittent Systems

Article

Dec 2020

Energy harvesting devices powered by ambient energies, instead of batteries, have been drawn lots of attention due to their advantages of energy-saving, easy deployment without relying on stable power sources, and smaller sizes, facilitating promising applications, such as environmental and health monitoring. These devices perform the computations intermittently, where the code executions are halted and resumed depending on the availability of the harvested energy. On such devices, the capacitors are present and served as the energy buffers for preserving the program states when sudden power outages occur. Nevertheless, the capacitors have relatively shorter lifetimes, compared with the rest of hardware components on the devices, and larger capacitors, which are desired by the systems requiring complex computations, hamper the achievement of device miniaturization, e.g., for medical implants or smart dust. In this paper, we propose a new intermittent checkpointing strategy, iCheck, to tackle the issues raised for the program-state retaining when the capacitors are not functioning correctly (or when the capacitor-less devices are adopted). The proposed iCheck is designed to perform the checkpointing-based program-state preserving progressively with being aware of the power-failure characteristics of the harvested energy source to maximize the progress forwarding and to ensure data consistency while encountering incomplete checkpoints caused by sudden power losses. The proposed design is evaluated with a series of experiments with encouraging results.

A Cluster-Based and Drop-aware Extension of RPL to Provide Reliability in IoT Applications

Conference Paper

Full-text available

Apr 2021

The standardized IPv6 Routing Protocol for Low-power and Lossy Networks (RPL) has enabled efficient communications between thousands of smart devices, sensors, and actuators in a bi-directional, and end-to-end manner, allowing the connection of resource constraint devices in multi-hop IoT infras-tructures. RPL is designed to cope with the major challenges of Low-power and Lossy Networks (LLNs), specifically their energy-efficiency. However, RPL is facing with severe congestion and load balancing problems, leading to a low Packet Delivery Ratio (PDR) in the network. For the first time since the declaration of RPL, in this paper we explain that ignoring the specifications of the reception and transmission buffers in heterogeneous networks has caused these unbalanced traffic loads, leading to congestion, and consequently loss of packets in RPL. In order to resolve this problem, this paper introduces CBR-RPL; a lightweight RPL-based routing mechanism, which organizes the nodes into logical clusters and routes the packets through a novel drop-aware Objective Function (OF). The newly defined OF considers the queue occupancy of the nodes transceivers along with their drop rate simultaneously. According to an extensive set of experiments, which have been conducted via the Cooja simulator, it has been observed that the CBR-RPL improves the reliability in terms of PDR by 38.2%, and 75% compared to RPL and QU-RPL, respectively. In addition, CBR-RPL has also improved the amount of energy consumption in the nodes by up to 3× compared to the state-of-the-art, mainly due to imposing fewer control packets to the network.

Impacts of Mobility Models on RPL-Based Mobile IoT Infrastructures: An Evaluative Comparison and Survey

Article

Full-text available

Sep 2020

With the widespread use of IoT applications and the increasing trend in the number of connected smart devices, the concept of routing has become very challenging. In this regard, the IPv6 Routing Protocol for Low-power and Lossy Networks (PRL) was standardized to be adopted in IoT networks. Nevertheless, while mobile IoT domains have gained significant popularity in recent years, since RPL was fundamentally designed for stationary IoT applications, it could not well adjust with the dynamic fluctuations in mobile applications. While there have been a number of studies on tuning RPL for mobile IoT applications, but still there is a high demand for more efforts to reach a standard version of this protocol for such applications. Accordingly, in this survey, we try to conduct a precise and comprehensive experimental study on the impact of various mobility models on the performance of a mobility-aware RPL to help this process. In this regard, a complete and scrutinized survey of the mobility models has been presented to be able to fairly justify and compare the outcome results. A significant set of evaluations has been conducted via precise IoT simulation tools to monitor and compare the performance of the network and its IoT devices in mobile RPL-based IoT applications under the presence of different mobility models from different perspectives including power consumption, reliability, latency, and control packet overhead. This will pave the way for researchers in both academia and industry to be able to compare the impact of various mobility models on the functionality of RPL, and consequently to design and implement application-specific and even a standard version of this protocol, which is capable of being employed in mobile IoT applications.

Ensuring consistent recovery under power failure with minimal NVM write overhead

Article

Feb 2024
J SYST ARCHITECT

Rapid recovery of program execution under power failures for embedded systems with NVM

Article

Sep 2023
MICROPROCESS MICROSY

Non-Stop Microprocessor for Fault-Tolerant Real-Time Systems

Article

Jan 2023

It is very important to design an embedded real-time system as a fault-tolerant system to ensure dependability. In particular, when a power failure occurs, restart processing after power restoration is required in a real-time system using a conventional processor. Even if power is restored quickly, the restart process takes a long time and causes deadline misses. In order to design a fault-tolerant real-time system, it is necessary to have a processor that can resume operation in a short time immediately after power is restored, even if a power failure occurs at any time. Since current embedded real-time systems are required to execute many tasks, high schedulability for high throughput is also important. This paper proposes a non-stop microprocessor architecture to achieve a fault-tolerant real-time system. The non-stop microprocessor is designed so as to resume normal operation even if a power failure occurs at any time, to achieve little performance degradation for high schedulability even if checkpoint creations and restorations are performed many times, to control flexibly non-volatile devices through software configuration, and to ensure data consistency no matter when a checkpoint restoration is performed. The evaluation shows that the non-stop microprocessor can restore a checkpoint within 5μsec and almost hide the overhead of checkpoint creations. The non-stop microprocessor with such capabilities will be an essential component of a fault-tolerant real-time system with high schedulability.

A survey and experimental analysis of checkpointing techniques for energy harvesting devices

Article

Mar 2022
J SYST ARCHITECT

With the advent of ultra-low-power embedded processors, energy harvesting devices (EHDs) are becoming exceedingly prevalent. These devices are highly portable, self-sustainable, and once deployed, they can run for an extremely long time. They can thus be installed at hard-to-reach locations. Despite the benefits, it is challenging to use these devices as they rely on sporadic and variable sources of ambient energy, and are equipped with very small memories. The intermittent nature of the ambient energy leads to a loss of device state. Such repeated failures might cause non-termination of the programs executing on these devices. To achieve termination, we need to use state retention techniques that guarantee the programs’ forward progress. Checkpointing is the most common state retention technique. However, performing checkpointing arbitrarily can lead to inefficient and incorrect execution of the program. Thus, several approaches have been proposed to perform checkpointing intelligently. In this paper, we present a comprehensive survey of these checkpointing techniques. We also performed an extensive evaluation of 13 state-of-the-art approaches and showed detailed time and energy figures for these approaches. Our comparison provides dual benefits: (i) it tells the reader which classes of checkpointing approaches are the best, (ii) it shows the sensitivity of performance with respect to various external factors such as the nature of the energy source and the energy storage capacity. This survey will help researchers quickly understand the complexities and issues involved in creating checkpointing schemes. It will further enable them to efficiently design their programs by choosing the best checkpointing mechanism according to their requirements.

Deep Reinforcement Learning Guided Backup for Energy Harvesting Powered Systems

Article

Feb 2021

Energy harvesting technology has been widely developed as a promising alternative of battery to power embedded systems. However, energy harvesting powered embedded systems may have potential frequent power interruptions due to unstable energy supply. Non-volatile processors (NVPs) are proposed to survive power failures by saving volatile data to non-volatile memory upon power failures and resuming them after power comes back. Traditionally, backup is triggered immediately when energy warning occurs. However, it is also possible to more aggressively utilize the residual energy for program execution to improve forward progress. In this work, we propose a deep reinforcement learning guided backup strategy to improve forward progress in energy harvesting powered intermittent embedded systems. The experimental results show an average of 8.3%, 51.6% and 325.3% improved forward progress compared with Q-learning, the related work ALD and traditional instant backup, respectively.

PROWL: A Cache Replacement Policy for Consistency Aware Renewable Powered Devices

Article

Oct 2020

Energy harvesting systems powered by renewable energy sources employ hybrid volatile-nonvolatile memory to enhance energy efficiency and forward progress. These systems have unreliable power sources and energy buffers with limited capacity, so they complete long-running applications across multiple power outages. However, a power outage might cause data inconsistency, because the data in nonvolatile memories are persistent, while the data in volatile memories are unsteady. State of the art studies proposed various memory architectures and compiler-based techniques to tackle the data inconsistency in these systems. These approaches impose too many unnecessary check-points on the system to avoid data inconsistency. These studies did not consider the effect of cache memory to mask and postpone the imposed check-points on the system for the sake of consistency. In this paper, we utilize the cache memory and propose PROWL, a consistency aware cache replacement policy to avoid data inconsistency with fewer check-points. The results show that PROWL has by up to 85% fewer check-points compare to the state of the art approaches, and PROWL improves the average response time of the system by up to 65%.

LWRAP: A Low-Waste Reliable Adiabatic Platform

Article

Full-text available

Oct 2020
COMPUT ELECTR ENG

Given the importance of reducing energy consumption and the challenge of heat generation in classic CMOS circuits, adiabatic circuits are believed as an appropriate alternative. Most of the adiabatic circuit families come with a dual-rail structure, which provides them with an inherent hardware redundancy. Although this redundancy could be used for improving their reliability, no studies have been previously conducted to exploit this feature. In this regard, in this paper, we show that by exploiting the inherent hardware redundancy in adiabatic circuits, their reliability could be improved, while imposing a relatively low amount of energy overhead. Subsequently, with utilizing the outcome observations, we have proposed LWRAP, a fault-tolerant circuit design for dual-rail adiabatic families. While improving reliability, LWRAP employs the well-known dynamic frequency scaling approach for mitigating the amount of imposed energy overhead in the design. Our precise SPICE simulations have shown that LWRAP improves the reliability of the circuit against transient faults by up to 12x, depending on the technology size.

Dynamic Power and Energy Management for Energy Harvesting Nonvolatile Processor Systems

Article

Full-text available

May 2017

Self-powered systems running on scavenged energy will be a key enabler for pervasive computing across the Internet of Things. The variability of input power in energy-harvesting systems limits the effectiveness of static optimizations aimed at maximizing the input-energy-to-computation ratio. We show that the resultant gap between available and exploitable energy is significant, and that energy storage optimizations alone do not significantly close the gap. We characterize these effects on a real, fabricated energy-harvesting system based on a nonvolatile processor. We introduce a unified energy-oriented approach to first optimize the number of backups, by more aggressively using the stored energy available when power failure occurs, and then optimize forward progress via improving the rate of input energy to computation via dynamic voltage and frequency scaling and self-learning techniques. We evaluate combining these schemes and show capture of up to 75.5% of all input energy toward processor computation, an average of 1.54 × increase over the best static “Forward Progress” baseline system. Notably, our energy-optimizing policy combinations simultaneously improve both the rate of forward progress and the rate of backup events (by up to 60.7% and 79.2% for RF power, respectively, and up to 231.2% and reduced to zero, respectively, for solar power). This contrasts with static frequency optimization approaches in which these two metrics are antagonistic.

A 130-nm Ferroelectric Nonvolatile System-on-Chip With Direct Peripheral Restore Architecture for Transient Computing System

Article

Jan 2019

Owing to its unique capability to sustain computation progress over power outages, a nonvolatile processor (NVP) is promising for energy-harvesting-powered Internet-of-Things devices. However, the widespread application of NVP is continually blocked by the system integration issues and the configuration overheads of peripheral devices. This paper presents a nonvolatile system-on-chip (NVSoC) with improved integration level, power management flexibility, and system wake-up speed. An on-chip power management subsystem is designed to minimize the number of external components while supporting versatile power policies. And a direct peripheral restore architecture is outlined, which enables a fast and parallel re-configuration of peripheral devices after the resumption of power supply. A test chip is fabricated in a 130-nm ferroelectric-CMOS process with 22.09-mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> area. Measurement results show 6× higher data throughput as compared with a conventional NVP when facing power failures.

EMC: Energy-Aware Morphable Cache Design for Non-Volatile Processors

Article

Nov 2018

Wearable, implantable and Internet of Things devices are attracting increasing attention from both research and industry fields. Energy harvesting is a promising alternative of battery to power these embedded systems. However, the intrinsic instability of energy harvesting systems leads to potential frequent power interruptions. In traditional volatile processor, all the status will be lost at power failures and the program needs to re-start after power resumes. In order to survive the power failures and enable accumulative execution, non-volatile processor (NVP) is proposed to back up volatile information before power depletion and recover the system status after power resumes. Non-volatile memory (NVM) is typically attached for cache and main memory backup. There are researches working on optimization of the backup. However, little of them involve multiple level cell (MLC) NVM. In this work, we first discuss the benefit of applying MLC NVM for cache backup and the architecture of morphable hybrid cache, and then propose a three-stage energy-aware cache management strategy to improve the system performance and energy utilization while guaranteeing successful backups. Backup-aware cache replacement policies are also developed for backup optimization. Evaluation shows that the proposed EMC scheme can achieve 10.6% performance improvement and simultaneous 25.2% energy reduction when compared with the single level cell (SLC) based hybrid cache.

Avoiding Data Inconsistency in Energy Harvesting Powered Embedded Systems

Article

Mar 2018

Energy harvesting is becoming a favorable alternative to power future generation embedded systems, as it is more environmentally and user friendly. However, energy harvesting powered embedded systems suffer from frequent execution interruption due to unstable energy supply. To tackle this problem, nonvolatile memory has been deployed to save the whole volatile state for computation. When power resumes, the processor can restore the state back to volatile memories and continue execution. However, without careful consideration, the process of checkpointing and resuming could cause inconsistency between volatile and nonvolatile memories, which leads to irreversible errors. In this article, we propose a consistency-aware adaptive checkpointing scheme that ensures correctness for all checkpoints. The proposed technique efficiently identifies all possible inconsistency positions in programs and inserts auxiliary code to ensure correctness by offline analysis. In addition, adaptive checkpointing assisted register file profiling and online tracking techniques further reduce the overhead of each checkpoint. Evaluation results show that the proposed checkpointing strategy can successfully eliminate inconsistency errors and greatly reduce the checkpointing overhead.

On the Limitations of Analyzing Worst-Case Dynamic Energy of Processing

Article

Feb 2018

This article examines dynamic energy consumption caused by data during software execution on deeply embedded microprocessors, which can be significant on some devices. In worst-case energy consumption analysis, energy models are used to find the most costly execution path. Taking each instruction’s worst-case energy produces a safe but overly pessimistic upper bound. Algorithms for safe and tight bounds would be desirable. We show that finding exact worst-case energy is NP-hard, and that tight bounds cannot be approximated with guaranteed safety. We conclude that any energy model targeting tightness must either sacrifice safety or accept overapproximation proportional to data-dependent energy.

Alpaca: intermittent execution without checkpoints

Article

Oct 2017

The emergence of energy harvesting devices creates the potential for batteryless sensing and computing devices. Such devices operate only intermittently, as energy is available, presenting a number of challenges for software developers. Programmers face a complex design space requiring reasoning about energy, memory consistency, and forward progress. This paper introduces Alpaca, a low-overhead programming model for intermittent computing on energy-harvesting devices. Alpaca programs are composed of a sequence of user-defined tasks. The Alpaca runtime preserves execution progress at the granularity of a task. The key insight in Alpaca is the privatization of data shared between tasks. Shared values written in a task are detected using idempotence analysis and copied into a buffer private to the task. At the end of the task, modified values from the private buffer are atomically committed to main memory, ensuring that data remain consistent despite power failures. Alpaca provides a familiar programming interface, a highly efficient runtime model, and places fewer restrictions on a target device's hardware architecture. We implemented a prototype of Alpaca as an extension to C with an LLVM compiler pass. We evaluated Alpaca, and directly compared to two systems from prior work. Alpaca eliminates checkpoints, which improves performance up to 15x, and avoids static multi-versioning, which improves memory consumption by up to 5.5x.

Evaluating tradeoffs in granularity and overheads in supporting nonvolatile execution semantics

Conference Paper

Mar 2017

Energy-aware memory mapping for hybrid FRAM-SRAM MCUs in intermittently-powered IoT devices

Article

Apr 2017

Forecasts project that by 2020, there will be around 50 billion devices connected to the Internet of Things (IoT), most of which will operate untethered and unplugged. While environmental energy harvesting is a promising solution to power these IoT edge devices, it introduces new complexities due to the unreliable nature of ambient energy sources. In the presence of an unreliable power supply, frequent checkpointing of the system state becomes imperative, and recent research has proposed the concept of in-situ checkpointing by using ferroelectric RAM (FRAM), an emerging non-volatile memory technology, as unified memory in these systems. Even though an entirely FRAM-based solution provides reliability, it is energy inefficient compared to SRAM due to the higher access latency of FRAM. On the other hand, an entirely SRAM-based solution is highly energy efficient but is unreliable in the face of power loss. This paper advocates an intermediate approach in hybrid FRAM-SRAM microcontrollers that involves judicious memory mapping of program sections to retain the reliability benefits provided by FRAM while performing almost as efficiently as an SRAM-based system. We propose an energy-aware memory mapping technique that maps different program sections to the hybrid FRAM-SRAM microcontroller such that energy consumption is minimized without sacrificing reliability. Our technique consists of eM-map, which performs a one-time characterization to find the optimal memory map for the functions that constitute a program and energy-align, a novel hardware-software technique that aligns the system’s powered-on time intervals to function execution boundaries, which results in further improvements in energy efficiency and performance. Experimental results obtained using the MSP430FR5739 microcontroller demonstrate a significant performance improvement of up to 2x and energy reduction of up to 20% over a state-of-the-art FRAM-based solution. Finally, we present a case study that shows the implementation of our techniques in the context of a real IoT application. This article is summarized in: Computer Science Teachers Association CSTA's mission is to empower, engage and advocate for K-12 CS teachers worldwide.

Data Backup Optimization for Nonvolatile SRAM in Energy Harvesting Sensor Nodes

Article

Jan 2017

Nonvolatile SRAM (nvSRAM) has been widely investigated as a promising on-chip memory architecture in energy harvesting sensor nodes, due to zero standby power, resilience to power failures and fast read/write operations. However, conventional approaches back up all data from SRAM into nonvolatile memory (NVM) when power failures happen. It leads to significant energy overhead and peak inrush current, which has a negative impact on the system performance and circuit reliability. This paper proposes a holistic data backup optimization to mitigate these problems in nvSRAM, consisting of a partial backup algorithm and a run-time adaptive write policy. A statistic dead-block predictor is employed to achieve dead block identification with trivial hardware overhead. An adaptive policy is used to switch between write-back and write-through strategy to reduce the rollback induced by backup failures. Experimental results show that the proposed scheme improves the performance by 4.6% on average while the backup power consumption and the inrush current are reduced by 38.1% and 54% on average compared to the full backup scheme. What’s more, the backup capacitor size for energy buffer can be reduced by 40% on average under the same performance constraint.

Chain: tasks and channels for reliable intermittent programs

Article

Oct 2016

Energy harvesting computers enable general-purpose computing using energy collected from their environment. Energy-autonomy of such devices has great potential, but their intermittent power supply poses a challenge. Intermittent program execution compromises progress and leaves state inconsistent. This work describes Chain: a new model for programming intermittent devices. A Chain program is a set of programmer-defined tasks that compute and exchange data through channels. Chain guarantees forward progress at task granularity. A task is restartable and never sees inconsistent state, because its input and output channels are separated. Our system supports language features for expressing advanced data exchange patterns and for encapsulating reusable functionality. Chain fundamentally differs from state-of-the-art checkpointing approaches and does not incur the associated overhead. We implement Chain as C language extensions and a runtime library. We used Chain to implement four applications: machine learning, encryption, compression, and sensing. In experiments, Chain ensured consistency where prior approaches failed and improved throughput by 2-7x over the leading state-of-the-art system.

COACH: Consistency Aware Check-pointing for Nonvolatile Processor in Energy Harvesting Systems

Abstract and Figures

Recommended publications

COACH: Consistency Aware Check-pointing for Nonvolatile Processor in Energy Harvesting Systems

CHANCE: Capacitor Charging Management Scheme in Energy Harvesting Systems

PROWL: A Cache Replacement Policy for Consistency Aware Renewable Powered Devices

CATNAP-Sim: A Comprehensive Exploration and a Nonvolatile Processor Simulator for Energy Harvesting...