Conference PaperPDF Available

Bitstream encryption and authentication with AES-GCM in dynamically reconfigurable systems

October 2008

October 2008

DOI:10.1109/FPL.2008.4629902

Source
IEEE Xplore

Conference: Field Programmable Logic and Applications, 2008. FPL 2008. International Conference on

Authors:

Yohei Hori

National Institute of Advanced Industrial Science and Technology

Akashi Satoh

The University of Electro-Communications

Hirofumi Sakane

National Institute of Advanced Industrial Science and Technology

Kenji Toda

National Institute of Advanced Industrial Science and Technology

A high-speed and secure dynamic partial reconfiguration (DPR) system is realized with AES-GCM that guarantees both confidentiality and authenticity of FPGA bitstreams. In DPR systems, bitstream authentication is essential for avoiding fatal damage caused by unintended bitstreams. An encryption-only system can prevent bitstream cloning and reverse engineering, but cannot prevent erroneous or malicious bitstreams from being configured. Authenticated encryption is a relatively new concept that provides both message encryption and authentication, and AES-GCM is one of the latest authenticated encryption algorithms suitable for hardware implementation. We implemented the AES-GCM-based DPR system targeting the Virtex-5 device on an off-the-shelf board, and evaluated its throughput and hardware resource utilization. For comparison, we also implemented AES-CBC and SHA-256 modules on the same device. The experimental results showed that the AES-GCM-based system achieved higher throughput with less resource utilization than the AES/SHA-based system. The AES-GCM-module achieved more than 1 Gbps throughput and the entire system achieved about 800 Mbps throughput with reasonable resource utilization. This paper clarifies the advantage of using AES-GCM for protecting DPR systems.

. Hardware utilization of the static module of PR-AES-GCM on Virtex-II Pro (XC2VP30).

…

Example operation of the Galois/Counter Mode (GCM).

…

Overview of the system using AES-GCM.

…

The architecture of the Galois Field multiplier.

…

Timing chart of decryption, verification, and reconfiguration.

…

Figures - uploaded by Yohei Hori

Content may be subject to copyright.

Content uploaded by Yohei Hori

Content may be subject to copyright.

Bitstream Encryption and Authentication using

AES-GCM in Dynamically Reconﬁgurable Systems

Yohei Hori

, Akashi Satoh

, Hirofumi Sakane

, and Kenji Toda

National Institute of Advanced Industrial Science and Technology (AIST)

1-1-1 Umezono, Tsukuba-shi, Ibaraki 305-8568, Japan

Abstract. A secure and dependable dynamic partial reconﬁguration (DPR) sys-

tem based on the AES-GCM cipher is developed, where the reconﬁgurable IP

cores are protected by encrypting and authenticating their bitstreams with AES-

GCM. In DPR systems, bitstream authentication is essential for avoiding fatal

damage caused by inadvertent bitstreams. Although encryption-only systems can

prevent bitstream cloning and reverse engineering, they cannot prevent erroneous

or malicious bitstreams from being accepted as valid. If a bitstream error is de-

tected after the system has already been partly conﬁgured, the system must be re-

conﬁgured with an errorless bitstream or at worst rebooted since the DPR changes

the hardware architecture itself and the system cannot recover itself to the initial

state by asserting a reset signal. In this regard, our system can recover from con-

ﬁguration errors without rebooting. To the authors’ best knowledge, this is the

ﬁrst DPR system featuring both bitstream protection and error recovery mecha-

nisms. Additionally, we clarify the relationship between the computation time and

the bitstream block size, and derive the optimal internal memory size necessary

to achieve the highest throughput. Furthermore, we implemented an AES-GCM-

based DPR system targeting the Virtex-5 device on an oﬀ-the-shelf board, and

demonstrated that all functions of bitstream decryption, veriﬁcation, conﬁgura-

tion, and error recovery work correctly. This paper clariﬁes the throughput, the

hardware utilization, and the optimal memory conﬁguration of said DPR system.

1 Introduction

Some recent Field-Programmable Gate Arrays (FPGAs) provide the ability of dynamic

partial reconﬁguration (DPR), where a portion of the circuit is replaced with another

module while the rest of the circuit remains fully operational. By using DPR, the func-

tionality of the system is reactively altered by replacing hardware modules according

to, for example, user requests, performance requirements, or environmental changes. To

date, various applications of DPR have been reported: content distribution security [1],

low-power crypto-modules [2], video processing [3], automotive systems [4], fault-

tolerant systems [5] and software-deﬁned radio [6] among others. It is expected that

in the near future, it will be more popular for mobile terminals and consumer electron-

ics to download hardware modules from the Internet in accordance with the intended

use.

In DPR systems where intellectual property (IP) cores are downloaded from net-

works, encrypting the hardware conﬁguration data (= bitstream) is a requisite for pro-

tecting the IP cores against illegal cloning and reverse engineering. Several FPGA fami-

lies have embedded decryptors and can be conﬁgured using encrypted bitstreams. How-

ever, such embedded decryptors are available only for the entire conﬁguration and not

for DPR. In addition to bitstream encryption, bitstream authentication is signiﬁcant for

protecting DPR systems [7]. Encryption-only systems are not suﬃciently secure as they

cannot prevent erroneous or malicious bitstreams from being used for conﬁguration.

Since DPR changes the hardware architecture of the circuits, unauthorized bitstreams

can cause fatal, unrecoverable damage to the system. In this regard, a mechanism of

error recovery is essential for the practical use of DPR systems. If a bitstream error

is detected after the bitstream has already been partly conﬁgured, the system must be

reconﬁgured with an errorless bitstream. Note that the system cannot be recovered by

asserting a reset signal since the hardware architecture itself has changed.

Based on the above considerations, we developed a DPR system which is capable of

protecting bitstreams using AES-GCM (Advanced Encryption Standard [8]-Galois/Counter

Mode [9,10]) and recovering from conﬁguration errors. To the authors’ best knowledge,

a DPR system featuring all mechanisms of bitstream encryption, bitstream veriﬁcation

and error recovery has not yet been developed, although several systems without recov-

ery mechanism have been reported so far [11–13].

AES-GCM is one of the latest authenticated encryption (AE) ciphers which can

guarantee both the conﬁdentiality and the authenticity of message, and therefore AE

could be eﬀectively applied to DPR systems. Indeed, data encryption and authentication

can be achieved with two separate algorithms, but if the area and speed performance

of the two algorithms are not balanced, the overall performance is determined by the

worse-performing algorithm. Therefore, AE is expected to enable more area-eﬃcient

and high-speed DPR implementations. Since other AE algorithms are not parallelizable

or pipelinable, and thus not necessarily suitable for hardware implementation [14], the

use of AES-GCM is currently the best solution for protecting bitstreams.

The conﬁguration of a downloaded IP core starts after its bitstream is successfully

veriﬁed. Bitstreams of large IP cores are split into several blocks, and veriﬁcation is per-

formed for each block. If the bitstream veriﬁcation of a particular block fails after some

other blocks have already been conﬁgured, the conﬁguration process is abandoned, and

reconﬁguration starts with an initialization bitstream. In this conﬁguration method, the

size of the split bitstream signiﬁcantly inﬂuences both the speed and the area perfor-

mance. Since the decrypted bitstream must not ﬂow out of the device and is thus stored

to the internal memory, the size of the split bitstream determines the required memory

resources. Although it is often thought that the speed performance can be improved by

increasing the size of the available memory, our study revealed that the overall through-

put can be maximized by using optimally sized internal memory.

This paper describes the architecture, memory conﬁguration, implementation re-

sults, and performance evaluation of an AES-GCM-based DPR system featuring an

error recovery mechanism. The system is implemented targeting Virtex-5 on an oﬀ-the-

shelf board, and we demonstrate that its mechanisms of bitstream encryption, veriﬁca-

tion and error recovery work successfully. The rest of this paper is organized as follows.

Section 2 introduces past studies on DPR security. Section 3 explains the process of

partial reconﬁguration of a Xilinx FPGA. Section 4 brieﬂy explains the cryptographic

algorithms related to our implementation. Section 5 describes the architecture of our

DPR system and explains the functions implemented in it. Section 6 determines the op-

timal memory conﬁguration of the DPR system and describes the experimental results,

implementation results, and evaluation of the systems. Finally, Section 7 summarizes

this paper and presents future work.

2 Related Work

Xilinx Virtex series devices support conﬁguration through encrypted bitstreams by uti-

lizing built-in bitstream decryptors. Virtex-II and Virtex-II Pro support the Triple Data

Encryption Standard (Triple-DES) [15] with a 56-bit key, while Virtex-4 and Virtex-5

support AES with a 256-bit key. The key is stored to the dedicated volatile memory

inside the FPGA. Therefore, the storage must always be supplied with power through

an external battery. Unfortunately, the functionality of conﬁguration through encrypted

bitstreams is not available when using DPR, and if the device is conﬁgured using the

built-in bitstream decryptor, the DPR function is disabled. Therefore, in DPR systems,

partial bitstreams must be decrypted by utilizing user logic.

Bossuet et al. proposed a secure conﬁguration method for DPR systems [11]. Their

system allows the use of arbitrary cryptographic algorithms since the bitstream decryp-

tor itself is implemented as a reconﬁgurable module. However, although their method

uses bitstream encryption, it does not consider the authenticity of the bitstreams.

Zeineddini and Gaj developed a DPR system which uses separate encryption and au-

thentication algorithms for bitstream protection [12], where AES was used for bitstream

encryption and SHA-1 for authentication. AES and SHA-1 were implemented as C pro-

grams and run on two types of embedded microprocessors: PowerPC and MicroBlaze.

The total processing times needed for the authentication, decryption, and conﬁguration

of a 14-KB bitstream on PowerPC and MicroBlaze were approximately 400 ms and 2.3

sec, respectively. Such performances, however, would be insuﬃcient for practical DPR

systems.

Parelkar used AE to protect FPGA bitstreams [13], and implemented various AE

algorithms: Oﬀset CodeBook (OCB) [16], Counter with CBC-MAC (CCM) [17] and

EAX [18] modes of operation with AES. In order to compare the performance of the

AE method with separate encryption and authentication methods, SHA-1 and SHA-512

were also implemented using AES-ECB (Electronic CodeBook).

3 Partial Reconﬁguration of FPGAs

This section brieﬂy describes the architecture of Xilinx FPGAs and the features of par-

tial reconﬁguration with Xilinx devices. Detailed information about Xilinx FPGAs can

be found in [19,20]. For more detailed information about Xilinx partial reconﬁguration,

see [21].

3.1 Xilinx FPGA

Xilinx FPGAs consist of Conﬁgurable Logic Blocks (CLBs), which compute various

logic, and an interconnection area which connects the CLBs. CLBs are composed of

several reconﬁgurable units called slices, and slices in turn contain several look-up ta-

bles (LUTs), which are the smallest reconﬁgurable logic units. In Virtex-5, each CLB

contains two slices, and each slice contains four 6-input LUTs. In Virtex-4 and earlier

Virtex series devices, each CLB contains four slices, and each slice contains two 4-input

LUTs. While the LUTs can be used as memory, Xilinx FPGAs also contain dedicated

memory blocks referred to as BlockRAMs or BRAMs.

bus

macro

PRR

static

module

static

module

ICAP

Reconf Ctrl

Decryptor

RAM

Authenticator

PRM

config

encrypted

bitstream

decryted

bitstream

AUTH

Fig. 1. Structure of a partially reconﬁgurable circuit in a Xilinx FPGA.

Virtex-II Pro Virtex-5

Partially Reconfigurable

Module (PRM)

Frame

Clock Region

Boundary

20 CLBs

Fig. 2. Frame of Xilinx FPGAs.

3.2 Partial Reconﬁguration Overview

In Xilinx FPGAs, modules which can be dynamically replaced are called Partially Re-

conﬁgurable Modules (PRMs), and the areas where PRMs are placed are called Par-

tially Reconﬁgurable Regions (PRRs). PRMs are rectangular and can be of arbitrary

size. Figure 1 shows an example structure of the partially reconﬁgurable design.

The smallest unit of a bitstream which can be accessed is called a frame. In Virtex-5

devices, a frame designates a 1312-bit piece of conﬁguration information corresponding

to the height of 20 CLBs. A bitstream of PRMs is a collection of frames. In Virtex-II

Pro and earlier Virtex devices, the height of the frame is the same as the height of the

device. Figure 2 illustrates the frames of Virtex-II Pro and Virtex-5.

3.3 Bus Macro

All signals between the PRMs and the ﬁxed modules must pass through bus macros

in order to lock the wiring. In Virtex-5 devices, the bus macros are 4-bit-wide pre-

routed macros composed of four 6-input Lookup Tables (LUTs). The bus macros must

be placed inside the PRMs. Furthermore, the bus macros of older device families are

8-bit-wide pre-routed macros composed of sixteen 4-input LUTs, which are placed on

the PRM boundary.

3.4 Internal Conﬁguration Access Port

Virtex-II and newer Virtex series devices support self DPR through the Internal Con-

ﬁguration Access Port (ICAP). ICAPs practically work in the same manner as the Se-

lectMAP conﬁguration interface. Since user logic can access the conﬁguration memory

Enc

P2 Pn

cnt 1 cnt 2 cnt n

H H H

Len

Enc

+1 +1

Enc

cnt 0 +1

Auth

TAG

Fig. 3. Example operation of the Galois/Counter Mode (GCM).

through ICAPs, the partial reconﬁguration of FPGAs can be controlled by internal user

logic. In Virtex-5 devices, the data width of the ICAP can be set to 8, 16 or 32 bits.

4 Cryptographic Algorithm

4.1 Advanced Encryption Standard

AES is a symmetric key block cipher algorithm standardized by the U.S. National In-

stitute of Standard and Technologies (NIST) [8]. AES replaces the previous Data En-

cryption Standard (DES) [22], whose 56-bit key is currently considered too short and

not suﬃciently secure. The block length of AES is 128 bits, and the key length can be

set to 128, 196, or 256 bits.

4.2 Galois/Counter Mode of Operation

The GCM [9] is one of the latest modes of operation standardized by NIST [10]. Fig-

ure 3 shows an example of GCM operation mode.

In order to generate a message authentication code (MAC), which is also called a

security tag, GCM uses universal hashing based on product-sum operation in the ﬁnite

ﬁeld GF(2

). The product-sum operation in GF(2

) enables faster and more compact

hardware implementation compared to integer computation. The encryption and the de-

cryption scheme of GCM is based on the CTR mode of operation [23], which can be

highly parallelized and pipelined. Therefore, GCM is suitable for hardware implemen-

tation, entailing a wide variety of performance advantages such as compactness to high

speed [24, 25]. Other AE algorithms are not necessarily suitable for hardware imple-

mentation as they are impossible to parallelize or pipeline [14].

AES-GCM is one of the GCM applications which uses AES as the encryption core.

Since AES is also based on the product-sum operation in GF(2

), either compact or

high-speed hardware implementation is possible. Therefore, the use of AES-GCM can

meet various performance requirements and is the best solution for protecting FPGA

bitstreams in DPR systems.

UART

Main

CTRL

Reconf

CTRL

SSRAM

CTRL

Internal RAMICAP

SSRAM

LEDs

aes_Din

Drdy

Krdy

IVrdy

LENrdy

uart_dat

wr_dat

rd_dat

TGvld

ssram_dat

ssram_addr

ssram_we

bitstream /

command

Host

Comptuter

AES

CTRL

AES-GCM

aes

trig

AUTH

PRM

aes_Dout aes_Dvld

TAG

ram_addr

ram_dout

icap_din

icap_we

icap_clk

icap_bsy

reconf

bsy

bus macro

rst

Fig. 4. Overview of the system using AES-GCM.

5 AES-GCM-based DPR Systems

This section describes the architecture of our DPR system, which uses AES-GCM for

bitstream encryption/decryption and veriﬁcation and is capable of recovering from con-

ﬁguration errors. Figure 4 shows a block diagram of said system. The length of the AES

key and the initial vector (IV) are set to 128 bits and 96 bits, respectively, and the AES

key is embedded into the system.

5.1 Conﬁguration Flow Overview

Encrypted bitstreams from PRMs are transferred from the host computer via RS232

and are stored to the external 36x256K-bit SSRAM. The conﬁguration of the PRM

starts when a conﬁguration command is sent from the host computer. The downloaded

bitstreams are decrypted by the AES-GCM module, and their authenticity is veriﬁed si-

multaneously. Since the plain bitstreams must not leak out to the device, the decrypted

bitstreams must be stored to the internal memory (Block RAM). Furthermore, since the

size of the internal memory is relatively small, large bitstreams are split into several

blocks, and decryption and veriﬁcation is performed to each bitstream block. To distin-

guish the divided bitstream block from the AES 128-bit data block, we deﬁne the former

as Bitstream Block (BSB). In the system, the memory size is set to 128x2

bits, and is

at most 128x8192 (1 Mb) due to device resource limitations. After the integrity of the

bitstream has been veriﬁed, the decrypted bitstream is read from the internal memory

and transferred to the ICAP to conﬁgure the PRM.

Note that AES-GCM requires initial processing such as key scheduling and IV setup

for each BSB. Therefore, the computation eﬀort for the same bitstream increases with

the number of BSBs. The smaller the internal memory is, the more compact the sys-

tem will be; however, computation eﬀort will increase. Conversely, if the memory size

is large, computation eﬀort will decrease, although the system will require more hard-

ware resources. Furthermore, since additional data such as a security tag, IV, and data

length, are attached to each BSB, the size of the downloaded data increases with the

number of BSBs. The trade-oﬀ between internal memory size, downloaded data size

and computation eﬀort is clariﬁed in Section 5.3 and Section 5.4.

Total Length

Security Tag

Block Length

Initial Vector

Security Tag

Block Length

Initial Vector

Encrypted

Bitstream Block 1

Encrypted

Bitstream Block 2

32 bit

address 0

b bits

128 bits

96 bits

Fig. 5. General structure of bitstreams stored to SSRAM.

The consideration is that simply dividing a bitstream into several BSBs will be

vulnerable against removal or insertion of a BSB. Though AES-GCM can detect tam-

pering with the BSB, it does not care the number or order of the successive BSBs. For

example, even if one of the successive BSBs is removed, AES-GCM cannot detect the

disappearance of the BSB and thus the system would be incompletely conﬁgured. In

addition, if a malicious BSB with its correct security tag is inserted to the series of the

BSBs, AES-GCM will recognize the inserted BSB as legitimate and thus the malicious

BSB will be conﬁgured in the device, causing system malfunction, data leakage and so

on. Therefore, some protection scheme to prevent BSB removal and insertion is nec-

essary for DPR systems. The protection scheme against these problems is discussed in

section 5.7.

5.2 Data Structure

In order to decrypt a PRM bitstream with AES-GCM, information about the security

tag, data length, and IV need to be appended to the head of the bitstream. Large bit-

streams are divided into several BSBs, and each BSB contains such header information.

In addition, the ﬁrst BSB contains information about the total bitstream length. Figure 5

shows the structure of the downloaded bitstream together with the header information,

which is loaded from SSRAM and set to the registers in the AES-GCM module when

the PRM conﬁguration begins.

5.3 Bitstream Decryption and Veriﬁcation

In the AES-GCM module, the major component (the S-box) is implemented using com-

posite ﬁeld. The initial setup of AES-GCM takes 59 cycles, and the ﬁrst BSB takes 19

additional cycles for setting up the total length of the entire bitstream. A 128-bit data

block is decrypted in 13 clock cycles, including SSRAM access time, and the decrypted

data are stored to the internal memory. The last block of BSB requires 10 clock cycles

in addition to the usual 13 for the purpose of calculating the security tag. The secu-

rity tag is calculated using GHAS H function deﬁned below, where A is the additional

authentication data, C is the ciphertext and H is the hash subkey.











0 i = 0

( X

i−1

⊕ A

) · H i = 1, . . . , m − 1

( X

m−1

⊕ (A

∗

||0

128−v

)) · H i = m

( X

i−1

⊕ C

i−m

) · H i = m + 1, . . . , m + n − 1

( X

m+n−1

⊕ (C

∗

||0

128−u

)) · H i = m + n

( X

m+n

⊕ (len(A)||len(C))) · H i = m + n + 1

(1)

The ﬁnal value X

m+n+1

becomes the security tag. In GHAS H function, the 128 x 128-bit

multiplication over Galois Field (GF) is achieved using 128 x 16-bit GF multiplier eight

times for saving the hardware resources. Fig.6 shows the GF multiplier implemented in

the AES-GCM module. The partial products of the 128 x 16-bit multiplier are summed

up into the 128-bit register Z. The calculation of Z ﬁnishes in 8 clock cycles.

An example timing chart of the AES-GCM module including the initial setup is

shown in Figure 7. Suppose that the size of the entire bitstream is S bits, and that it is

split into n BSBs. Let the size of the k th BSB be b

bits, and b

, b

, . . . , b

n−1

be BSBs

of the same size b. Then, the entire size S is expressed as follows:

S =

k=1

n−1

k=1

b + b

= (n − 1) · b + b

. (2)

As Figure 7 illustrates, the required number of clock cycles T

aes

for the decryption

and veriﬁcation of the entire bitstream is

aes

= 19 + (n − 1) ·

59 + 13 ·

128

+ 10 + 2

59 + 13 ·

128

+ 10 + 2

= 19 +

13 (n − 1) b + 13 b

128

+ 71 n

13 S

128

+ 71 n + 19 (∵ S = (n − 1) b + b

) . (3)

As the above equation indicates, the computation eﬀort for AES-GCM increases

with the number of BSBs n.

5.4 PRM Conﬁguration

Unlike other DPR systems, our system does not use an embedded processor to control

the partial reconﬁguration. The input data and control signals from the ICAP are directly

connected to and controlled by the user logic. Thus, our system is free from the delay

of processor buses. In the system, the width of the ICAP data port is set to 32 bits.

When the frequency of the input data to the ICAP is f [MHz], the throughput of the

reconﬁguration process P

icap

= 32 f [Mbps]. (4)

127

126

127

126

127

126

a128

128

128-bit x 16-bit multiplier

Fig. 6. The architecture of the Galois Field multiplier.

s/32

AES initial setup Decryption & Verification

PRM config

78 13*s/128 +10

1st bitstram block

AES initial setup

Krdy

TGrdy

LENrdy

IVrdy

Drdy

TGvld

AUTH

Reconf

Reconf_BSY

s/32 + 5

PRM config

2nd bitstram block (last block)

213*s/128 +10

(block size = s [bit])

Decryption & Verification

Fig. 7. Timing chart of decryption, veriﬁcation, and reconﬁguration.

In Virtex-5, the maximum frequency of the ICAP is limited to 100 MHz, thus the ideal

throughput of the reconﬁguration process is 3,200 Mbps.

Figure 7 also shows the timing of the conﬁguration of the PRM bitstream. When the

size of the BSB is b bits, the conﬁguration of the BSB ﬁnishes in b/32 cycles. The last

BSB takes 5 additional cycles to ﬂush the buﬀer in the device. Therefore, the required

number of computation cycles for the PRM conﬁguration T

recon f

= (n − 1) ·

+ 5

+ 5 (∵ S = (n − 1) b + b

). (5)

5.5 Error Recovery

In the system, the ﬁrst several bytes of the SSRAM are reserved for the initialization

PRM, which is used for recovering the system from DPR errors. The use of the initial-

ization PRM enables the system to return to the start-up state without rebooting the en-

tire system. Thus, processes executed in other modules can retain their data even when

DPR errors occur. The bitstream of the initialization PRM is encrypted and processed

in the same way as that of other PRMs. If the bitstream size is S bits, the computation

time for decryption, veriﬁcation, and conﬁguration is derived from equations (3) and

(5).

When bitstream veriﬁcation fails with AES-GCM, the current process is abandoned

and conﬁguration of the initialization PRM is started. Note that the unauthorized BSB is

still in the internal memory and it will be overwritten by the initial PRM. Therefore, the

unauthorized bitstream will be safely erased and will not be conﬁgured in the system. If

the veriﬁcation of the initialization PRM fails due to, for example, bitstream tampering

or memory bus damage, the system discontinues the conﬁguration process and prompts

the user to reboot the system.

5.6 Overall Computation Time

The decryption, veriﬁcation, and conﬁguration of the BSBs is processed in a course-

grained pipeline, as shown in Figure 7. The conﬁguration of all BSBs except the last

BSB overlaps with the decryption process. Therefore, the total computation time T ,

including bitstream encryption, veriﬁcation, and conﬁguration, is

T =

128

S + 71 n + 19

+ 5

128

S + 71 n +

+ 24. (6)

If the bitstream encryption, veriﬁcation and conﬁguration cannot be processed in a

pipeline, the total number of computation cycles T

= T

aes

+ T

recon f

128

S + 71 n + 19



+ 5



128

S + 71 n + 24. (7)

Considering that S ≥ b

, the improvement of the computation time due to the use

of a course-grained pipeline architecture is

− T =

S − b

(≥ 0). (8)

5.7 Countermeasure against BSB Removal and Insertion

As mentioned in section 5.1, dividing the bitstream into several BSBs is vulnerable

against attacks of BSB removal and insertion. One scheme to protect such attacks is

to use sequential numbers as the initial vector (IV) for calculating security tag. In this

protection scheme, each BSB has Block Number (BN) that denotes the position of the

BSB in the bitstream. The initial BN is unique to each PRM. The BN of the ﬁrst BSB

is used as IV and simultaneously stored to the internal register or memory. The stored

BN is incremented and used as IV every time a BSB is loaded. If the loaded BSB has

diﬀerent BN from the stored value, the conﬁguration is immediately terminated and the

recovery process is started.

The computation time slightly increases when BN is used for the bitstream protec-

tion, because reading BN from SSRAM takes several clock cycles. Suppose that the

length of BN is l

. The clock cycles required to read BN are dl

/32e, as the width of

the SSRAM is 32 bits. In this case, the total computation time with pipeline processing

) is

128

S +

71 +

n + 19

+ 5

128

S +

71 +

n +

+ 24. (9)

The increased time due to the use of BN is

− T =

n. (10)

The equation (10) indicates that the additional BN will have more eﬀect on computation

time as the number of the BSBs n increases. As is given in Section 6.2, the size of n is

typically 4 to 16. Thus, the time increase caused by using BN is quite small compared

to the total computation time.

This study is the ﬁrst step toward developing a secure practical DPR system and its

main purpose is to demonstrate the feasibility of the recovery mechanism of the AES-

GCM-based DPR system, so the additional protection logic with BN is currently not

implemented. Implementing the additional protection logic is left as future work.

6 Implementation

This section describes the implementation results of the abovementioned AES-GCM-

based DPR system (hereinafter PR-AES-GCM). PR-AES-GCM is implemented tar-

geting Virtex-5 (XC5VLX50T-FFG1136) on an ML505 board [26]. The systems are

designed using Xilinx Early Access Partial Reconﬁguration (EA PR) ﬂow [27] and are

implemented with ISE 9.1.02i PR10 and PlanAhead 9.2.7 [28].

Computation cycles (T) [cycle]

Bitstream block size (s) [bit]

min

pipeline, S = 2

non-pipeline, S = 2

Fig. 8. Relationship between the BSB size b and the total number of computation cycles T and

6.1 PRM Implementation

In order to test whether all mechanisms of bitstream encryption, veriﬁcation, and er-

ror recovery work properly, we implemented two simple function blocks, a 28-bit up-

counter, and a 28-bit down-counter as PRMs. In addition, two bus macros were placed

in the PRR for the input and output signals, respectively. The most signiﬁcant 4 bits

of the counter were the outputted from the PRM and connected to LEDs on the board.

The PRR contained 80 slices, 640 LUTs, and 320 registers. The size of the bitstream

for this area became about 12 KB (= 96 K bits), which could change slightly depend-

ing on the complexity of the implemented functions. The sizes of the up-counter and

down-counter PRMs were 87,200 and 85,856 bits, respectively.

6.2 Internal Memory

In order to determine the required size of the internal memory, equation (6) should be

transformed to express the relationship between T and b. For estimation purposes, we

suppose that the size of the last BSB b

is b bits. In this case, equation (6) is rewritten

as follows:

T =

128

S + 71 n + 19

+ 5

128

S +

71 S

+ 24. (∵ S = n · b) (11)

Figure 8 illustrates the variation of the total computation time T in accordance with

the BSB size b under the conditions S = 2

, 2

and 2

. For comparison,

Table 1. Hardware utilization of the static module of PR-AES-GCM on Virtex-5 (XC5VLX50T).

Module Register (%) LUT (%) Slice (%) BRAM (%)

Overall 2,876 10.0% 5,965 20.7% 1,958 27.2% 5 8.3%

AES-GCM 1,382 4.8% 3,691 12.8% 1,615 22.4% 0 0.0%

MAIN CTRL 463 1.6% 643 2.2% 360 5.0% 0 0.0%

AES CTRL 164 0.6% 277 1.0% 192 2.7% 0 0.0%

SSRAM CTRL 103 0.4% 174 0.6% 97 1.3% 1 1.7%

RECONF CTRL 68 0.2% 142 0.5% 76 1.1% 0 0.0%

RAM CTRL 143 0.5% 156 0.5% 161 2.2% 0 0.0%

CONFIG RAM 0 0.0% 0 0.0% 0 0.0% 4 6.7%

equation (7) is transformed as follows, and its graph is also shown in Figure 8.

= T

aes

+ T

recon f

128

S + 71 n + 19



+ 5



128

S +

71 S

+ 24. (∵ S = n · b) (12)

As Figure 8 clearly shows, the course-grained pipeline architecture is eﬀective for

shortening the overall processing time. The computation cycles in non-pipelined cir-

cuits decrease monotonically, while those in pipelined circuits have minimal values, as

indicated by the arrows in Figure 8. When the entire size S is 2

, 2

or 2

the respective BSB sizes b which minimize T are 2

, 2

and 2

.The most

time-eﬃcient DPR systems were realized by setting the size of the internal memory to

b as derived here. Equation (11) is useful for balancing the computation time and circuit

size under the required speed and area performance.

Incidentally, the system with the BN-based protection shows completely the same

results as ones given above, that is, the respective optimal sizes b are 2

, 2

and 2

for the same S values.

After deriving the relationship between T and b, we determined the most time-

eﬃcient memory conﬁguration for the PRMs introduced in Section 6.1. The size S

should be set to a slightly larger value than the prepared PRMs in order to accommodate

other PRMs with diﬀerent sizes. Therefore, S is set to 2

, which is the minimal 2

meeting the requirement 2

> 87200. As Figure 8 illustrates, the optimal BSB size b

under the condition S = 2

is 2

. Therefore, the internal memory conﬁguration is set

to 128 × 128 (= 2

) bits.

6.3 Hardware Resource Utilization

Table 1 shows the hardware utilization of PR-AES-GCM implemented on a Virtex-5.

The “Overall” item shows the total amount of hardware resources used by all mod-

ules except PRM. Table 1 also describes the hardware utilization of each module as a

standalone implementation.

The hardware architecture of Virtex-5 is vastly diﬀerent from that of earlier devices

such as Virtex-II Pro and Virtex-4. Each slice in Virtex-5 contains four 6-input LUTs,

Table 2. Hardware utilization of the static module of PR-AES-GCM on Virtex-II Pro (XC2VP30).

Module Register (%) LUT (%) Slice (%) BRAM (%)

Overall 2,900 10.6% 8,080 29.5% 4,900 35.8% 4 2.9%

AES-GCM 1,387 5.1% 5,566 20.3% 3,233 23.6% 0 0.0%

MAIN CTRL 463 1.7% 1,133 4.1% 713 5.2% 0 0.0%

AES CTRL 173 0.6% 316 1.2% 166 1.2% 0 0.0%

SSRAM CTRL 103 0.4% 218 0.8% 132 1.0% 0 0.0%

RECONF CTRL 59 0.2% 153 0.6% 94 0.7% 0 0.0%

RAM CTRL 143 0.5% 168 0.6% 97 0.7% 0 0.0%

CONFIG RAM 0 0.0% 0 0.0% 0 0.0% 4 2.9%

Table 3. Comparison of the performances of diﬀerent secure PR systems (14,112 bytes PRM).

System Device Slice Veriﬁcation Decryption Conﬁguration Overall Ratio

PR-AES-GCM XC5VLX50T 4,900

∗

119.110 µ s 35.3 µs 123.72 µs 1

947.8 Mbps 3195 Mbps 913 Mbps

PowerPC [12] XC2VP30 1,334

∗∗

139 ms 208 ms 56 ms 403 ms 3257

812 kbps 543 kbps 2016 kbps 280 kbps

MicroBlaze [12] XC2VP30 1,706

∗∗

776 ms 1472 ms 32 ms 2280 ms 18429

145 kbps 77 kbps 3528 kbps 50 kbps

AES-OCB [13] XC4VLX60 2,964 601 Mbps - -

AES-CCM [13] XC4VLX60 2,799 255 Mbps - -

AES-EAX [13] XC4VLX60 2,993 287 Mbps - -

∗

The slice utilization of Virtex-II Pro is shown for the purpose of fair comparison.

∗∗

Includes only the reconﬁguration controllers.

whereas that of earlier devices contains two 4-input LUTs. Thus, the number of slices

is smaller in the Virtex-5 implementation. In order to give a fair comparison with other

studies, we also implemented the above system on a Virtex-II Pro (XC2VP30-FF896).

The hardware utilization of PR-AES-GCM on Virtex-II Pro is given in Table 2.

Here we consider the hardware utilization of the additional protection logic using

BN. The logic needs registers or memory to store BN and comparators to verify if the

BSB has correct BN. In addition, an adder is required to increment the BN stored in the

on Virtex-5 and Virtex-II Pro under the condition that the size of BN is 128 bits. As

a result, the logic utilizes 129 registers, 173 LUTs and 45 slices on Virtex-5, and 129

registers, 194 LUTs and 99 slices on Virtex-II Pro. These utilizations are all less than

1% of the entire resources. Therefore, the additional circuit will have little eﬀect on the

resource utilization of the whole system.

6.4 DPR Experiments

In order to experimentally demonstrate that all functions of bitstream encryption, veriﬁ-

cation, and conﬁguration as well as the error recovery mechanism operate correctly, we

conﬁgured the PRMs on the developed DPR system. Figure 9 shows the structure of the

Total Length

header

address 0

8192 16384

bitstream block (b1)

bitstream block (b2)

bitstream block (bn)

Initialization PRM

(up-counter)

PRM1

(down-counter)

PRM2

erroneous

bitstream

header

bitstream block

header

bitstream block

Fig. 9. Bitstream structure in SSRAM in the DPR experiment.

bitstreams in the DPR experiment. The PRM with the up-counter (hereinafter PRM0) is

placed at address 0 as the initialization bitstream, and the PRM with the down-counter

(hereinafter PRM1) is placed at address 8192. Conﬁguration with an erroneous bit-

stream was emulated by inverting the ﬁrst byte of the bitstream of PRM1 and using the

bitstream thus obtained for PRM2.

The experimental procedure is outlined below.

1. The system is booted. Note that the most signiﬁcant 4 bits of the counter in the

PRM0 are connected to LEDs on the board.

2. The conﬁguration command is sent from the host computer with the SSRAM ad-

dress “8192” to conﬁgure PRM1.

3. The bitstream at address 8192 is loaded from SSRAM, decrypted, veriﬁed, and

conﬁgured.

4. The conﬁguration command is sent from the host computer with the SSRAM ad-

dress “16384” to conﬁgure PRM2.

5. The bitstream at address 16384 is loaded from SSRAM, decrypted, veriﬁed, and

conﬁgured.

When the system was booted, the LEDs indicated that the up-counter was imple-

mented in PRM0, and after PRM1 was conﬁgured, the LEDs indicated that the down-

counter was implemented in PRM1. This result shows that the decryption and veriﬁca-

tion with AES-GCM worked correctly and that DPR was performed successfully.

After PRM2 was conﬁgured, the LEDs indicated that the up-counter was imple-

mented in PRM0. Note that PRM2 is an erroneous bitstream generated based on the

output of PRM1, which is equipped with the down-counter. This result shows that the

conﬁguration of PRM2 failed and the system was reconﬁgured with PRM0, which is

equipped with the up-counter. Therefore, the error recovery mechanism was demon-

strated to operate correctly.

6.5 Performance Evaluation

The clock frequency of PR-AES-GCM is 100 MHz. In order to enable comparison

with [12], the computation time required to conﬁgure a 14,112-byte (112,896-bit) PRM

is described in Table 3. Decryption, veriﬁcation, and conﬁguration with PR-AES-GCM

can be implemented in a pipeline, and the respective computation time is derived from

equation (6).

In PowerPC and MicroBlaze systems, authentication, decryption, and reconﬁgura-

tion are performed sequentially, and therefore the overall processing time is simply the

sum of the processing times of each step. Table 3 also gives the throughput of other AE

algorithms as reported in [13].

6.6 Analysis of the Results

The results of the experiment in Section 6.4 indicate that all functions of bitstream de-

cryption, veriﬁcation, conﬁguration, and error recovery work properly. Thus, the system

described above is the ﬁrst operational DPR system featuring both bitstream protection

and error recovery mechanisms.

As shown in Table 3, PR-AES-GCM achieved the highest overall throughput of

over 900 Mbps with only about 1/3 slice utilization. Note that PR-AES-GCM includes

error recovery logic, an SSRAM controller, etc. Additionally, the AES-GCM module

achieved a throughput of about 950 Mbps, which is faster than those of other AE meth-

ods of OCB, CCM, and EAX. It is remarkable that such high throughput is achieved

with such small size of the internal memory as determined by equation (11). The per-

formance of the system is often thought to improve as the memory size increases. How-

ever, in course-grained DPR architectures, equation (11) reveals that optimally sized

internal memory can maximize the throughput of the entire system. The device can ac-

commodate at most 128 × 2

bits of memory, while our system uses only 128 × 2

bits.

Therefore, suﬃcient memory resources are available for various user logic.

Furthermore, PowerPC and MicroBlaze DPR systems require an overall computa-

tion time between several hundred milliseconds and several seconds, which is unac-

ceptable for practical DPR systems. Therefore, authentication, decryption, and recon-

ﬁguration should be processed using dedicated hardware in order to realize practical

DPR systems. Compared to software AE systems, our approach attained extremely high

performance, where PR-AES-GCM achieved a 3257 times higher throughput than the

PowerPC system and an 18429 times higher throughput than the MicroBlaze system.

7 Conclusions

We developed a secure and dependable dynamic partial reconﬁguration (DPR) system

featuring AES-GCM authentication and error recovery mechanisms. Furthermore, it

was experimentally demonstrated that the functions of bitstream decryption, veriﬁca-

tion, conﬁguration, and error recovery operate correctly. To the authors’ best knowl-

edge, this is the ﬁrst operational DPR system featuring both bitstream protection and

error recovery mechanisms.

Through the implementation of the above system on a Virtex-5 (XC5VLX50T),

AES-GCM achieved a throughput of about 950 Mbps, and the entire system achieved

a throughput of more than 910 Mbps, which is suﬃcient for practical DPR use, and

utilized only 1/3 of the slices. This performance is higher than that of other modes of

operation such as OCB, CCM, and EAX.

Remarkably, it was found that using optimally sized internal memory entails the

highest throughput in the DPR system. Although it is often thought that the performance

of the system improves as the memory increases, our study revealed that optimizing the

size of the internal memory depending on the size of the entire bitstream provides the

shortest processing times. Thus, our system was able to achieve the highest throughput

with the least amount of memory resources.

The future work of this study is to implement further security mechanisms to prevent

attacks such as the bitstream block removal and insertion. This paper showed that the

protection scheme using block numbers as the initial vector would be implemented with

hardly sacriﬁcing the computation time and hardware resources. Another future work is

to develop various application systems, such as content distribution and multi-algorithm

cryptoprocessors, based on the DPR system described above.

References

1. Hori, Y., Yokoyama, H., Sakane, H., Toda, K.: A secure content delivery system based on a

partially reconﬁgurable FPGA. IEICE Trans. Inf.&Syst. E91-D(5) (May 2008) 1398–1407

2. Hori, Y., Sakane, H., Toda, K.: A study of the eﬀectiveness of dynamic partial reconﬁguration

for size and power reduction. In: IEICE Tech. Rep. RECONF2007-56. (January 2008) 31–36

(in Japanese).

3. Claus, C., Zeppenfeld, J., Muller, F., Stechele, W.: Using partial-run-time reconﬁgurable

hardware to accelerate video processing in driver assistance system. In: DATE’07. (2007)

498–503

4. Becker, J., Hubner, M., Hettich, G., Constapel, R., Eisenmann, J., Luka, J.: Dynamic and

partial FPGA exploitation. Proc. IEEE 95(2) (2007) 438–452

5. Emmert, J., Stroud, C., Skaggs, B., Abramovici, M.: Dynamic fault tolerance in FPGAs via

partial reconﬁguration. In: FCCM 2000. (2000) 165–174

6. Delahaye, J.P., Gogniat, G., Roland, C., Bomel, P.: Software radio and dynamic reconﬁgu-

ration on a DSP/FPGA platform. J. Frequenz 58(5-6) (2004) 152–159

7. Drimer, S.: Authentication of FPGA bitstreams: Why and how. In: ARC’07. Volume LNCS

4419. (2007) 73–84

8. National Institute of Standards and Technology: Announcing the advanced encryption stan-

dard (AES). FIPS PUB 197 (November 2001)

9. McGrew, D.A., Viega, J.: The Galois/counter mode of operation (GCM) (May 2005) http://

csrc.nist.gov/groups/ST/toolkit/BCM/modes development.html.

10. Dworkin, M.: Recommendation for Block Cipher Modes of Operation: Galois/Counter

Mode (GCM) and GMAC. National Institute of Standards and Technology. SP 800-38D

edn. (November 2007)

11. Bossuet, L., Gogniat, G.: Dynamically conﬁgurable security for SRAM FPGA bitstreams.

Int. J. Embedded Systems 2(1/2) (2006) 73–85

12. Zeineddini, A.S., Gaj, K.: Secure partial reconﬁguration of FPGAs. In: ICFPT’05. (2005)

155–162

13. Parelkar, M.M.: Authenticated encryption in hardware. Master’s thesis, George Mason

University (2005)

14. McGrew, D.A., Viega, J.: The security and performance of the Galois/counter mode (GCM)

of operation. In: INDOCRYPT 2004. (2004) 343–355

15. National Institute of Standards and Technology: Recommendation for the triple data encryp-

tion algorithm (TDEA) block cipher (May 2004)

16. Rogaway, P., Bellare, M., John, B.: OCB: A block-cipher mode of operation for eﬃcient

authenticated encryption. ACM Trans. Information and System Security 6(3) (August 2003)

365–403

17. Whiting, D., Housley, R., Ferguson, N.: Counter with CBC-MAC (CCM). RFC3610

(September 2003)

18. Bellare, M., Rogaway, P., Wagner, D.: A conventional authenticated-encryption

mode. http://www-08.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/eax/eax-

spec.pdf (2003)

19. Xilinx, Inc.: Virtex-5 User Guide. (2007)

20. Xilinx, I.: Virtex-4 User Guide. (2007)

21. Lysaght, P., Blodget, B., Mason, J., Young, J., Bridgford, B.: Enhanced architectures, design

methodologies and CAD tools for dynamic reconﬁguration of Xilinx FPGAs. In: FPL’06.

(2006) 12–17

22. U.S. Department of Commerce/National Institute of Standards and Technology: Data En-

cryption Standard (DES). FIPS PUB 46-3 edn. (1999)

23. Dworkin, M.: Recommendation for Block Cipher Modes of Operation. National Institute of

Standards and Technology. SP 800-38A edn. (December 2001)

24. Satoh, A.: High-speed parallel hardware architecture for Galois counter mode. In: ISCAS’07.

(2007) 1863–1866

25. Satoh, A., Sugawara, T., Aoki, T.: High-speed pipelined hardware architecture for Galois

counter mode. In: ISC’07. (2007) 118–129

26. Xilinx, Inc.: ML505/ML506 Evaluation Platform. UG347(v2.4) edn. (October 2007)

27. Xilinx, Inc.: Early Access Partial Reconﬁguration User Guide For ISE 8.1.01i. (2006)

28. Jackson, B.: Partial Reconﬁguration Design with PlanAhead 9.2. Xilinx, Inc. (August 2007)

Enabling Secure and Efficient Sharing of Accelerators in Expeditionary Systems

Article

Full-text available

May 2024

The addition of FPGAs in the cloud is an emerging effort to support acceleration and performance with the flexibility of logic reprogramming. The underlying logic per unit area of the FPGA chip has multiplied, making it challenging for a single-user design to utilize completely and efficiently. Major service providers (such as Amazon, Alibaba, and Baidu) are moving toward a shared FPGA model that allows system designers to share the chip fabric either spatially or temporally. This virtual partitioning of FPGAs is comparable to the expeditionary systems that also adhere to the same principle of sharing chip fabric among multiple tenants. These tenants have the potential to execute any untrusted application on this shared hardware, which is a serious cause for concern in expeditionary systems. For instance, a tenant can deploy malicious circuits that compromise the confidentiality, integrity, and availability of its fellow tenants. In this paper, we investigate the threat landscape and propose mitigation strategies for multitenant FPGAs. We assess threats to the confidentiality of users’ critical data that are novel to the FPGA-as-a-Service (FaaS) framework. We present a defense mechanism for cloud FPGAs that verifies the integrity of tenants. In order to safeguard multi-tenant FPGAs from denial-of-service (DoS) attacks, our secondary defense mechanism promptly identifies malicious tenants and notifies the cloud orchestrator, thereby ensuring availability. We offer a comprehensive, all-in-one solution designed to defend and mitigate various threats faced by users in multi-tenant cloud FPGAs (in the public domain). The same principles apply to expeditionary systems with SWAP-constrained devices where multiple (potentially untrusted) applications share the same hardware. The proposed solution is thus adaptable and extendable to both public cloud service providers and expeditionary systems with private cloud infrastructure. The results show that the proposed work offers (i) safe-and-secure isolation of tenants, (ii) run-time access policy updates, and (iii) resilience against DoS attacks.

Unified Hardware for High-Throughput AES-Based Authenticated Encryptions

Article

Full-text available

Sep 2020

This brief presents an efficient unified hardware for up-to-date authenticated encryptions with associated data (AEADs). Although some major AEADs share several fundamental components (e.g., advanced encryption standard (AES), block chaining, and XOR-Encryption-XOR (XEX) scheme), each AEAD is equipped with a unique mode of operation and/or sub-functions, which makes it difficult to integrate various AEADs in a hardware efficiently. The proposed hardware in this brief efficiently unifies the fundamental components to perform a set of AEADs with minimal area and power overheads. The proposed configurable datapath is adapted to a set of peripheral operations (e.g., block chaining and XEX), dictated by the given AEAD algorithm. In this brief, we also demonstrate the validity of the proposed hardware through an experimental design adapted to four AES-based AEADs. Consequently, we confirm that the proposed hardware can perform the four AEADs with quite smaller area than the sum of the each dedicated AEAD hardware, comparable throughput and power consumption. In addition, we confirmed that the proposed hardware is superior to software implementation on general-purpose processor in terms of both throughput and power consumption.

A Security Key Recovery System with Channel Quality Awareness for Smart Grid Applications

Article

Full-text available

In this paper, a security key recovery system with channel quality awareness (SKRS-CQA) for smart grid applications has been proposed. Firstly, the proper key recovery agents (KRAs) are determined based on the signal-to-noise ratio (SNR) outage probability. The result of such selection includes the number and the index of selected KRAs. Then, the session key (KS) of a Smart Meter Unit (SMU) will be divided into many different pieces according to the proposed key partitioning algorithm and stored in the selected KRAs for the future key recovery if the data concentrator unit (DCU) has lost the key in unexpected events. The outage probability of SNR, the probability of KRA failure, and the probability of key compromising are also investigated. In addition, a 128-bit AES-GCM encryption algorithm is used in each KRA for authentication and identification mechanisms based on a DLMS/COSEM protocol. As shown in the system performance analysis, the system reliability, the system availability, and the data confidentiality have been improved compared with the conventional scheme. Moreover, a cooperative communication network with an amplify-and-forward relaying protocol and an optimal power allocation has been employed for improving the system reliability. From computer simulation results, it showed that the reliability of the proposed system with a cooperative scheme has been improved significantly.

Towards Embedded System Hardware Security Design and Analysis

Article

Nov 2019

Yanping Gong

Security in embedded system design, which has long been a critical problem for ensuring the confidentiality, data integrity and system reliability for embedded system designers and users, is now facing a new dimension of threat from the attacks on hardware. As the IC design reaches sub-micron regime, increased sensitivity of device under environmental condition has made some new types of attacks possible, while the analysis and detection for design vulnerabilities against these attacks are harder on the much more complicated designs nowadays. In the meanwhile, more efficient and diverse attack methodologies are developed by attackers as the technology advances. On the other hand, embedded system has limitations on the hardware resources and power consumption which can be allocated for preventive or defensive countermeasures. The future trends of system development, including cloud computing, distributed network and internet-of-things (IoT) are also pushing the edge of such limitations on embedded system designs. Low cost, high efficiency, and flexible hardware security design methodologies are needed for the current IC production ow as well as the future application scenarios. In this thesis, we're presenting several efforts made towards low cost and high efficiency embedded hardware security design and analysis. First, the finite state machine based circuit vulnerability analysis framework is proposed. Second, we demonstrated a secure scan architecture design which utilizes novel property of memristor devices. Lastly, a side channel resilience design methodology is presented for FPGA bitstream protection.

Securing Soft IP Cores in FPGA based Reconfigurable Mobile Heterogeneous Systems

Preprint

Full-text available

Dec 2019

The mobile application market is rapidly growing and changing, offering always brand new software to install in increasingly powerful devices. Mobile devices become pervasive and more heterogeneous, embedding latest technologies such as multicore architectures, special-purpose circuits and reconfigurable logic. In a future mobile market scenario reconfigurable systems are employed to provide high-speed functionalities to assist execution of applications. However, new security concerns are introduced. In particular, protecting the Intellectual Property of the exchanged soft IP cores is a serious concern. The available techniques for preserving integrity, confidentiality and authenticity suffer from the limitation of heavily relying onto the system designer. In this paper we propose two different protocols suitable for the secure deployment of soft IP cores in FPGA-based mobile heterogeneous systems where multiple independent actors are involved: a simple scenario requiring trust relationship between entities, and a more complex scenario where no trust relationship exists through adoption of the Direct Anonymous Attestation protocol. Finally, we provide a prototype implementation of the proposed architectures.

Recent Attacks and Defenses on FPGA-based Systems

Article

Full-text available

Aug 2019

Field-programmable gate array (FPGA) is a kind of programmable chip which is widely used in many areas, including automotive electronics, medical devices, military and consumer electronics, and is gaining more and more popularity. Unlike the application specific integrated circuits (ASIC) design, an FPGA-based system has its own supply chain model and design flow, which brings interesting security and trust challenges. In this survey, we review the security and trust issues related to FPGA-based systems from the market perspective, where we model the market with the following parties: FPGA vendors, foundries, IP vendors, EDA tool vendors, FPGA-based system developers and end users. For each party, we show the security and trust problems they need to be aware of and the associated solutions that are available. We also discuss some challenges and opportunities in the security and trust of FPGA-based systems used in large-scale cloud and data centers.

Conception de matériel salutaire pour lutter contre la contrefaçon et le vol de circuits intégrés

Thesis

Nov 2016

Cédric Marchand

Le vol et la contrefaçon touchent toutes les sphères industrielles de nos sociétés. En particulier, les produits électroniques représentent la deuxième catégorie de produits la plus concernée par ces problèmes. Parmi les produits électroniques les plus touchés, on retrouve les téléphones mobiles, les tablettes, les ordinateurs mais aussi des éléments bien plus basiques comme des circuits analogiques ou numériques et les circuits intégrés. Ces derniers sont au coeur de la plupart des produits électroniques et un téléphone mobile peut être considéré comme contrefait s’il possède ne serait-ce qu’un seul circuit intégré contrefait. Le marché de la contrefaçon de circuits intégrés représente entre 7 et 10% du marché total des semi-conducteurs, ce qui implique une perte d’au moins 24 milliards d’euros en 2015 pour les entreprises concevant des circuits intégrés. Ces pertes pourraient s’élever jusqu’à 36 milliards d’euros en 2016. Il est donc indispensable de trouver des solutions pratiques et efficaces pour lutter contre la contrefaçon et le vol de circuits intégrés. Le projet SALWARE, financé par l’Agence Nationale de la Recherche et par la Fondation de Recherche pour l’Aéronautique et l’Espace, a pour but de lutter contre le problème de la contrefaçon et du vol de circuits intégrés et propose l’étude et la conception de matériels salutaires (ou salwares). En particulier, l’un des objectifs de ce projet est de combiner astucieusement plusieurs mécanismes de protection participant à la lutte contre la contrefaçon et le vol de circuits intégrés, pour construire un système d’activation complet. L’activation des circuits intégrés après leur fabrication permet de redonner leur contrôle au véritable propriétaire de la propriété intellectuelle. Dans ce manuscrit de thèse, nous proposons l’étude de trois mécanismes de protection participant à la lutte contre la contrefaçon et le vol de circuits intégrés. Dans un premier temps, nous étudierons l’insertion et la détection de watermarks dans les machines à états finies des systèmes numériques synchrones. Ce mécanisme de protection permet de détecter un vol ou une contrefaçon. Ensuite, une fonction physique non-clonable basée sur des oscillateurs en anneaux dont les oscillations sont temporaires est implantée et caractérisée sur FPGA. Ce mécanisme de protection permet d’identifier un circuit grâce à un identifiant unique créé grâce aux variations du processus de fabrication des circuits intégrés. Enfin, nous aborderons l’implantation matérielle d’algorithmes légers de chiffrement par bloc, qui permettent d’établir une communication sécurisée au moment de l’activation d’un circuit intégré

Authentication and Confidentiality in FPGA-Based Clouds

Chapter

Sep 2023

FPGAs have gained popularity as efficient accelerators for cloud computing, offering high computational capabilities surpassing general-purpose processors and GPUs. Cloud providers such as AWS and Alibaba offer FPGA-based cloud services to meet users’ needs for acceleration, particularly for computationally intensive applications such as AI or ML algorithms. Cloud security is critical to cloud users. They require secure remote FPGA acceleration with minimal performance impact. Privacy and protection of sensitive intellectual property and data from the cloud provider is a requirement for the user. In this chapter, a state of the art on FPGA cloud architecture and authentication is detailed. To address FPGA cloud security challenges, an FPGA-based cloud authentication and access delegation framework utilizing OAuth 2.0 is proposed. This protocol is adapted to FPGA cloud to securely authenticate entities involved in remote FPGA provisioning, enhancing overall security and flexibility with a tokenized access scheme.

Cyber Security Protocol for Secure Traffic Monitoring Systems using PUF-based Key Management

Conference Paper

Dec 2020

Enabling self-adaptability of small scale and large scale security systems using dynamic partial reconfiguration

Article

Full-text available

Oct 2021

The application areas of field programmable gate arrays (FPGAs) are increasing due to its hardware acceleration and reprogrammable features. From large-scale computation systems like cloud, aerospace, and defence to small-scale computation systems like home automation and mobile phones, the dynamic partial reconfiguration property is found to be attractive to design adaptive systems for self-reconfiguration and self-healing. The article presents two self- adaptive security systems for small scale as well as for large-scale systems. The security system is designed to include encryption accelerators and hash code generation accelerators. The security system designed for small-scale systems saves space and power using hardware adaptation by loading or creating only the required accelerator during execution of the application. It uses light weight cryptographic algorithms. A second design for large-scale systems focuses on getting more throughput by allocating more resources to the required accelerator at runtime. Two designs are created for hardware adaptation based on the accelerator requirement at runtime. Proposed adaptive design for small-scale systems achieved 68.50% decrease in resource consumption and design for large-scale systems achieved 25.93% more throughput than the best existing implementations.

Software Radio and Dynamic Reconfiguration on a DSP/FPGA platform

Article

Full-text available

May 2004

This paper discusses the implementation of modulation chains for multi-standard communications on a dynamically and partially reconfigurable heterogeneous platform. Implementation results highlight the benefit of considering a DSP/FPGA platform instead of a multi-DSP platform since the FPGA supports efficiently intensive computation components, which reduces the DSP load. Furthermore, partial dynamic reconfiguration increases the overall performance as compared to total dynamic reconfiguration since there is 45% of bitstream size reduction, which leads to a 45% decrease of the whole reconfiguration time. The implementation of modulation chains for multi-standard communications proves the availability of new technology to support efficiently Software Defined Radio.

Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm

Article

Full-text available

Oct 2008

An authenticated encryption scheme is a symmetric encryption scheme whose goal is to provide both privacy and integrity. We consider two possible notions of authenticity for such schemes, namely integrity of plaintexts and integrity of ciphertexts, and relate them, when coupled with IND-CPA (indistinguishability under chosen-plaintext attack), to the standard notions of privacy IND-CCA and NM-CPA (indistinguishability under chosen-ciphertext attack and nonmalleability under chosen-plaintext attack) by presenting implications and separations between all notions considered. We then analyze the security of authenticated encryption schemes designed by “generic composition,” meaning making black-box use of a given symmetric encryption scheme and a given MAC. Three composition methods are considered, namely Encrypt-and-MAC, MAC-then-encrypt, and Encrypt-then-MAC. For each of these and for each notion of security, we indicate whether or not the resulting scheme meets the notion in question assuming that the given symmetric encryption scheme is secure against chosen-plaintext attack and the given MAC is unforgeable under chosen-message attack. We provide proofs for the cases where the answer is “yes” and counter-examples for the cases where the answer is “no.”

OCB

Article

Aug 2003

We describe a parallelizable block-cipher mode of operation that simultaneously provides privacy and authenticity. OCB encrypts-and-authenticates a nonempty string M i {0, 1}* using ⌈v M v/ n ⌉ + 2 block-cipher invocations, where n is the block length of the underlying block cipher. Additional overhead is small. OCB refines a scheme, IAPM, suggested by Charanjit Jutla. Desirable properties of OCB include the ability to encrypt a bit string of arbitrary length into a ciphertext of minimal length, cheap offset calculations, cheap key setup, a single underlying cryptographic key, no extended-precision addition, a nearly optimal number of block-cipher calls, and no requirement for a random IV. We prove OCB secure, quantifying the adversary's ability to violate the mode's privacy or authenticity in terms of the quality of its block cipher as a pseudorandom permutation (PRP) or as a strong PRP, respectively.

Recommendation for block cipher modes of operation: Galois/Counter Mode (GCM) and GMAC

Chapter

Jan 2007

Morris Dworkin

AUTHENTICATED ENCRYPTION IN HARDWARE by

Article

Counter with cbc-mac (ccm)

Article

Jan 2003

Encyclopedia of Mathematics and its Applications

Article

Jul 1982

Handbook of Applied Cryptography

Article

Jan 1997

From the Publisher: A valuable reference for the novice as well as for the expert who needs a wider scope of coverage within the area of cryptography, this book provides easy and rapid access of information and includes more than 200 algorithms and protocols; more than 200 tables and figures; more than 1,000 numbered definitions, facts, examples, notes, and remarks; and over 1,250 significant references, including brief comments on each paper.

Recommendation for Block Cipher Modes of Operation: The CMAC Mode for Authentication

Article

Jan 2005

Morris Dworkin

Recommendation for Block Cipher Modes of Operation. Methods and Techniques

Article

Dec 2001

Morris Dworkin

This recommendation defines five confidentiality modes of operation for use with an underlying symmetric key block cipher algorithm: Electronic Codebook (ECB), Cipher Block Chaining (CBC), Cipher Feedback (CFB), Output Feedback (OFB), and Counter (CTR). Used with an underlying block cipher algorithm that is approved in a Federal Information Processing Standard (FIPS), these modes can provide cryptographic protection for sensitive, but unclassified, computer data.

Bitstream encryption and authentication with AES-GCM in dynamically reconfigurable systems

Abstract and Figures

Recommended publications

InvMixColumn Decomposition and Multilevel Ressource Sharing in Rijndael Implementation

Asynchronous FPGA Architectures for Cryptographic Applications

High Throughput AES Algorithm Using Parallel Subbytes and MixColumn

Implementation of Advance Encryption Standard (AES) to Securely Store and Maintain Research Data