ThesisPDF Available

COST & PERFORMANCE EVALUATION OF DATA CONFIDENTIALITY IN HIGH THROUGHPUT/CLOUD BASED MULTI-TIER APPLICATIONS

December 2014

December 2014

DOI:10.13140/RG.2.2.26245.35045

Thesis for: Master Of Science In Information Security
Advisor: Dr. Waheed Iqbal

Authors:

Faisal Shahzad

Air University

Cloud computing attracts a large number of users to host their applications and data mainly due to on-demand resource provisioning and pay-as-you-go features. Web applications are one of the important types of applications that are deployed over the cloud. However, the owners of the application are concerned about their data privacy and security. One of the key techniques to ensure data security (confidentiality aspect only) is encryption and decryption; however, it introduces overhead in the performance of the application. From the end user’s point of view, response time is one of the main performance metrics. In this thesis, we study the possible mechanism to ensure the data privacy and security concerns of the owners of the cloud-hosted applications without requiring to modify applications code. We identified that CryptDB is one of the possible solutions to integrate with web applications without requiring to modify the code. CryptDB claims to provide confidentiality over the databases and allows the execution of queries over encrypted data with minimal overhead. In this thesis, we perform cost and performance analysis of usingCryptDB with a multi-tier web application hosted on Amazon cloud using different configurations. Our experimental evaluation shows that a specific response time is possible to provide for a large number of users however a substantial increase in the cost of upgrading the infrastructure brings up to a 40% gain in performance if required as per the need of the organization.

CryptDB Architecture [9]

…

Range Of Functionality Provided By Systems Under Evaluation Below are the results for the above mentioned measurements.

…

Overall Architecture Of Monami [6]

…

Mylar Architecture [13]

…

Experiment 01 On Local Test Bed

…

Figures - uploaded by Faisal Shahzad

Content may be subject to copyright.

Content uploaded by Faisal Shahzad

Content may be subject to copyright.

COST & PERFORMANCE EVALUATION OF

DATA CONFIDENTIALITY IN HIGH

THROUGPUT/CLOUD BASED MULTI-TIER

APPLICATIONS

By:

Mr. Faisal Shahzad

UET-11F-MSIS-CASE-04

Supervisor

Dr. Waheed Iqbal

DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING

CENTRE FOR ADVANCED STUDIES IN ENGINEERING

UNIVERSITY OF ENGINEERING AND TECHNOLOGY

- i -

TAXILA

Semester Fall 2014

- ii -

COST & PERFORMANCE EVALUATION OF DATA

CONFIDENTIALITY IN HIGH THROUGPUT/CLOUD

BASED MULTI-TIER APPLICATIONS

A report submitted in partial fulfillment of the requirements for the M.Sc.

Thesis

By:

Mr. Faisal Shahzad

UET-11F-MSIS-CASE-04

Approved by:

_____________________

Supervisor:

Dr. Waheed Iqbal

_____________________

External Examiner:

Dr. Zia Ud Din

DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING

CENTRE FOR ADVANCED STUDIES IN ENGINEERING

UNIVERSITY OF ENGINEERING AND TECHNOLOGY TAXILA

Semester Fall 2014

- iii -

TABLE OF CONTENTS

TABLE OF CONTENTS ............................................................................................................. iii

DECLARATION........................................................................................................................ viii

DEDICATION ............................................................................................................................. ix

ACKNOWLEDGEMENT ............................................................................................................ x

PROFILES ..................................................................................................................................... xi

Supervisor (Dr. Waheed Iqbal) .............................................................................................. xi

External Supervisor (Dr. Zia Ud Din) .................................................................................. xi

Members Research Committee (Dr. Farrukh Kamran) ...................................................... xi

Members Research Committee (Dr. Shafaat Ahmed Bazaz) ............................................ xi

LIST OF ABBREVIATIONS AND ACRONYMS ................................................................... xii

ABSTRACT ................................................................................................................................ xiii

EXECUTIVE SUMMARY ......................................................... Error! Bookmark not defined.

LIST OF FIGURES ..................................................................................................................... xiv

LIST OF TABLES ........................................................................................................................ xv

Chapter 1 ....................................................................................................................................... 1

DATA SECURITY IN HIGH THROUGHPUT/CLOUD BASED APPLICATION ............ 1

1.1 Introduction ........................................................................................................................ 1

1.2 Motivation ........................................................................................................................... 2

1.2.1 Need For Data Protection ........................................................................................... 2

1.2.2 Multi-Tier Architecture ............................................................................................... 4

1.2.3 Cloud Computing ........................................................................................................ 5

- iv -

1.2.4 Government Spying .................................................................................................... 6

1.2.5 End User Privacy ......................................................................................................... 6

1.3 Objectives of study ............................................................................................................. 7

1.4 Scope and Contribution ..................................................................................................... 9

1.5 Limitations ......................................................................................................................... 10

1.6 Significance of the study ................................................................................................. 10

1.7 Thesis Outline ................................................................................................................... 11

Chapter 2 ..................................................................................................................................... 12

LITERATURE REVIEW ............................................................................................................. 12

2.1 Introduction ...................................................................................................................... 12

2.2 Application Architecture ................................................................................................. 13

2.3 Ensuring Confidentiality Through Encryption............................................................ 14

2.4 Homomorphic Encryption .............................................................................................. 16

2.4.1 Fully Homomorphic Encryption [FHE] ................................................................. 16

2.4.2 Partial Homomorphic Encryption (PHE) ............................................................... 18

2.4.3 Functional Encryption (FE) ...................................................................................... 18

2.5 Current Frameworks ........................................................................................................ 19

2.5.1 CryptDB ...................................................................................................................... 19

2.5.2 Monami ....................................................................................................................... 20

2.5.3 Mylar ........................................................................................................................... 21

2.5.4 Selection for test implementation ............................................................................ 22

Chapter 3 ..................................................................................................................................... 26

CRYPTDB .................................................................................................................................... 26

3.1 Introduction ...................................................................................................................... 26

- v -

3.2 Security in Multi-Tier Architecture Using CryptDB ................................................... 26

3.3 Query Execution and Data Confidentiality .................................................................. 29

3.3.1 CryptDB Scope For Confidentiality ........................................................................ 30

3.3.2 CryptDB Encryption Schemes ................................................................................. 31

3.3.2.1 RND Scheme ....................................................................................................... 32

3.3.2.2 DET Scheme ........................................................................................................ 32

3.3.2.4 Paillier Cryptosystem ........................................................................................ 33

3.3.2.5 SEARCH .............................................................................................................. 33

Chapter 4 ..................................................................................................................................... 34

RESEARCH METHODOLOGY ............................................................................................... 34

4.1 Introduction ...................................................................................................................... 34

4.2 Sample Application Scenario .......................................................................................... 34

4.3 Test Scenario ..................................................................................................................... 36

4.3.1 Technologies Used During Experiments ................................................................ 37

4.3.2 Implementation Scenarios ........................................................................................ 37

4.4 Test Execution ................................................................................................................... 38

4.4.1 Experiment On Local Test Bed ................................................................................ 38

4.4.2 Experiments On Amazon Cloud ............................................................................. 39

4.5 Test Use Cases ................................................................................................................... 41

4.5.1 Test Bed Environment ............................................................................................... 41

4.5.2 Client Side Load ......................................................................................................... 42

Chapter 5 ..................................................................................................................................... 43

EXPERIMENTS & RESULTS .................................................................................................... 43

5.1 Introduction ...................................................................................................................... 43

- vi -

5.2 Experiments On Local Test Bed ..................................................................................... 43

5.2.1 Experiment-01 On Local Test Bed ........................................................................... 45

5.2.2 Experiment-02 On Local Test Bed ........................................................................... 46

5.2.3 Experiment-03 On Local Test Bed ........................................................................... 47

5.2.4 Comparison Of Means Of Experiments On Local Test Bed ................................ 48

5.3 Experiments On Amazon Cloud .................................................................................... 49

5.3.1 Experiment-01 On Amazon Cloud .......................................................................... 50

5.3.2 Experiment-02 On Amazon Cloud ..................................................................... 51

5.3.3 Experiment-03 On Amazon Cloud ..................................................................... 52

5.3.4 Experiment-04 On Amazon Cloud ..................................................................... 53

5.3.4 Comparison Of Means Of Experiments On Amazon Cloud .......................... 54

5.3.5 Comparison Of Means Of Extended Experiments On Amazon Cloud ........ 55

5.4 Upgradation Of Cloud Infrastructure ........................................................................... 56

5.4.1 Upgraded Test Infrastructure .................................................................................. 56

5.4.2 Performance Gain Using Light Load ...................................................................... 57

5.4.2 Performance Gain Using Medium Load ................................................................ 58

5.4.3 Performance Gain Using Heavy Load .................................................................... 60

Chapter 6 ..................................................................................................................................... 63

CONCLUSION & RECOMMENDATIONS ........................................................................... 63

6.1 Introduction ...................................................................................................................... 63

6.2 Conclusion ......................................................................................................................... 63

6.3 Recommendations ............................................................................................................ 65

6.4 Future Work ...................................................................................................................... 66

APPENDICES ............................................................................................................................. 67

- vii -

A - SERVER INFRASTRUCTURE – LOCAL TEST BED .................................................. 67

B - SERVER INFRASTRUCTURE – CLOUD TEST BED .................................................. 67

C - SERVER INFRASTRUCTURE –CLOUD TEST BED (EXTENDED) ......................... 67

D – PHASE 01 EXPERIMENTS RESULTS .......................................................................... 68

E – PHASE 02 (CLOUD BASED) EXPERIMENTS RESULTS .......................................... 70

F –PERFORMANCE GAIN AFTER UPGRADE ................................................................ 72

G –PERFORMANCE GAIN vs COST OF UPGRADE ...................................................... 74

REFERENCES ............................................................................................................................. 75

- viii -

DECLARATION

The substance of this thesis is the original work of the author and due

references and acknowledgements have been made, where necessary, to the

work of others. No part of this thesis has been already accepted for any

degree, and it is not being currently submitted in candidature of any degree.

_______________

Mr. Faisal Shahzad

UET-11F-MSIS-CASE-04

M.Sc. Thesis Scholar

Countersigned:

______________

Dr. Waheed Iqbal

Thesis Supervisor

- ix -

DEDICATION

To the knowledge,

And to those who are continuously struggling

To add in it

To spread it

To make people like us able to understand it

Love You All

- x -

ACKNOWLEDGEMENT

A special thanks to Allah Almighty, His kindness and blessings spreads over my whole life that

makes every achievement in my life possible for me. Alhamdulillah.

I would like to express my special appreciation and thanks to my advisor Professor Dr.

Waheed Iqbal, you have been a tremendous mentor for me. I would like to thank you for

encouraging my research and for allowing me to grow as a research student. Your advices have

been priceless.

I would also like to thank my committee members, Professor Dr. Furrukh Kamran,

Professor Dr. Zia Ud Din, and Professor Dr. Shafaat A Bazaaz for serving as

my committee members. I would especially like to thank CASE management for their kind

support specially Mr. Zeeshan Saleem for his kind guidance and support as and when required.

A special thanks to goes to my family. Words cannot express how grateful I am to you all.

Your prayer for me was what sustained me thus far.

- xi -

PROFILES

Supervisor (Dr. Waheed Iqbal)

Ph.D., Cloud Computing, 2012 Asian Institute of Technology, Thailand

M.Eng., Computer Science, 2009 Asian Institute of Technology, Thailand and Technical

University Catalonia, Spain

B.S., Software Engineering, 2005 Bahria University Karachi Campus, Pakistan

External Supervisor (Dr. Zia Ud Din)

Postdoctoral Research, Computer Science, Feb 2012-Aug 2012, University of Nice, Sophia

Antipolis, France

Ph.D., Computer Science, 2009 Asian Institute of Technology, Thailand

M.S., Computer Science, 2003 Bahria University, Islamabad Campus, Pakistan

B.Eng., Civil Engineering, 2000 UCET, Baha Uddin Zakariya University, Multan,

Pakistan

Members Research Committee (Dr. Farrukh Kamran)

Ph.D., Electrical Engineering, 1995 Georgia Institute of Technology, Atlanta, GA USA

M.S., Electrical Engineering, 1992 Georgia Institute of Technology, Atlanta, GA USA

B.Sc. (Eng.), Electrical Engineering, University of Engineering & Technology, Lahore

Members Research Committee (Dr. Shafaat Ahmed Bazaz)

Ph.D., Controls and Computer Sciences, 1998 Institute National des Sciences Appliqués

(INSA) Toulouse, France

M.S., 1994 Université de Franche Comté, Besanc¸ France

B.S., 1989 NED University of Engineering and Technology, Karachi, Pakistan

- xii -

LIST OF ABBREVIATIONS AND ACRONYMS

Abbreviation

Details

AES

Advance Encryption Standard

Database

DBMS

Database Management System

FHE

Full Homomorphic Encryption

LAMP

Linux, Apache, MySQL, PHP

OLAP

Online Analytical Processing

OLTP

Online Transaction Processing

PHE

Partial Homomorphic Encryption

SQL

Structured Query Language

- xiii -

ABSTRACT

Cloud computing attracts a large number of users to host their applications and data

mainly due to on-demand resource provisioning and pay-as-you-go features. Web

applications are one of the important types of applications that are deployed over the

cloud. However, the owners of the application are concerned about their data privacy

and security. One of the key technique to ensure data security (confidentiality aspect

only) is encryption and decryption, however, it introduces overhead in the performance

of the application. From the end user’s point of view, response time is one of the main

performance metrics.

In this thesis, we study the possible mechanism to ensure the data privacy and security

concerns of the owners of the cloud hosted applications without requiring to modify

application’s code. We identified that CryptDB is one of the possible solution to integrate

with web applications without requiring to modify the code. CryptDB claims to provide

confidentiality over the databases and allows execution of queries over encrypted data

with minimal overhead. In this thesis, we perform cost and performance analysis of using

CryptDB with a multi-tier web application hosted on Amazon cloud using different

configurations. Our experimental evaluation shows that a specific response time is

possible to provide for a large number of users however a substantial increase in the cost

by upgrading the infrastructure brings up to 40% gain in performance if required as per

the need of organization.

- xiv -

LIST OF FIGURES

Figure 1 Three Dimensions Of Data Snoopers ........................................................................ 4

Figure 2 Multi-Tier Architecture .............................................................................................. 14

Figure 3 CryptDB Architecture [9] .......................................................................................... 20

Figure 4 Overall Architecture Of Monami [6] ....................................................................... 21

Figure 5 Mylar Architecture [13] ............................................................................................. 22

Figure 6 Model For Design Of Experiment For This Research Study ................................ 27

Figure 7 System Flow Of CryptDB .......................................................................................... 30

Figure 8 Experiment 01 On Local Test Bed ............................................................................ 45

Figure 9 Experiment 02 On Local Test Bed ............................................................................ 46

Figure 10 Experiment 03 On Local Test Bed .......................................................................... 47

Figure 11 Comparison of MEAN of experiments performed on Local Test bed .............. 48

Figure 12 Experiment 01 On Cloud Test Bed ......................................................................... 50

Figure 13 Experiment 02 On Cloud Test Bed ......................................................................... 51

Figure 14 Experiment 03 On Cloud Test Bed ......................................................................... 52

Figure 15 Experiment 04 On Cloud Test Bed ......................................................................... 53

Figure 16 Comparison Of Mean Of Experiments On Cloud Test Bed ............................... 54

Figure 17 Comparison Of Mean Throughput - Extended Test Cases ................................ 55

Figure 18 Performance Gain After Upgrade - Light Load ................................................... 57

Figure 19 Performance Gain vs Cost - Light Load ................................................................ 58

Figure 20 Performance Gain After Upgrade - Medium Load ............................................. 59

Figure 21 Performance Gain vs Cost - Medium Load .......................................................... 60

Figure 22 Performance Gain After Upgrade - Heavy Load ................................................. 61

Figure 23 Performance Gain vs Cost - Heavy Load .............................................................. 62

- xv -

LIST OF TABLES

Table 1 Ease Of Implementation Provided By Systems Under Evaluation ....................... 23

Table 2 Range Of Security Provided By Systems Under Evaluation ................................. 24

Table 3 Range Of Functionality Provided By Systems Under Evaluation ........................ 25

Table 4 Comparison Of Solutions ............................................................................................ 25

Table 5 Implementation Scenarios ........................................................................................... 38

Table 6 Specifications of Machines Used In Phase 01 ........................................................... 39

Table 7 Specifications of Machines Used In Phase 02 ........................................................... 40

Table 8 Enhanced Specifications of Machines Used In Phase 02 ........................................ 41

Table 9 Client Side Load ........................................................................................................... 42

Table 10 Phase 01 Test Bed ....................................................................................................... 43

Table 11 Phase 2 Test Bed ......................................................................................................... 49

Table 12 Server Infrastructure - Local Test Bed ..................................................................... 67

Table 13 Server Infrastructure - Cloud Test Bed ................................................................... 67

Table 14 Server Infrastructure - Cloud Test Bed (Extended) ............................................... 67

Table 15 Local Test - Experiment 01 ........................................................................................ 68

Table 16 Local Test - Experiment 02 ........................................................................................ 68

Table 17 Local Test - Experiment 03 ........................................................................................ 69

Table 18 Comparison Of Mean Response Time (Phase 01) ................................................. 69

Table 19 Cloud Test - Experiment 01 ...................................................................................... 70

Table 20 Cloud Test - Experiment 02 ...................................................................................... 70

Table 21 Cloud Test - Experiment 03 ...................................................................................... 71

Table 22 Cloud Test - Experiment 04 ...................................................................................... 71

Table 23 Cloud Test - Comparison Of Mean Response Time .............................................. 72

Table 24 Performance Gain - Light Load ................................................................................ 72

Table 25 Performance Gain - Medium Load .......................................................................... 73

Table 26 Performance Gain - Heavy Load.............................................................................. 73

- xvi -

Table 27 Performance Gain vs Cost (Light Load) ................................................................. 74

Table 28 Performance Gain vs Cost (Medium Load) ............................................................ 74

Table 29 Performance Gain vs Cost (Heavy Load) ............................................................... 74

- 1 -

Chapter 1

DATA SECURITY IN HIGH THROUGHPUT/CLOUD

BASED APPLICATION

1.1 Introduction

Information technology revolution brings a new way of managing things using

information and communication technologies. For the modern world, adoption of this

new approach results in the spill of data everywhere. The data relates to the every aspect

of the related systems and hence requires the availability of it as and when required. The

fast, optimized, cost effective and efficient solutions offered by information and

communication technologies adoption in real world scenarios also brings a serious

concern with them which related to the security of data.

In recent path, the security of confidential data becomes a major issue specially got

highlighted after the incident of secret documents initially breached by Chelsea Manning

[1] [2] and released to WikiLeaks. This incident brings a catastrophic revolution to United

States defense. Later, Edward Snowden [3] [4] brings a lot of hidden truths about the US

surveillance and spy program that is used for spying the whole world.

- 2 -

2014 was the most divesting year in term of leakage of confidential data to adversaries.

The study shows that data approximately equals to 740 million records was breached by

the malicious entities [5].

1.2 Motivation

1.2.1 Need For Data Protection

Organizations runs on the corporate data that is vital to their existence. This includes (but

not limited to) their financial documents, policies, future plans, marketing strategies,

research documents, employees related details, customers related details and so on. This

data acts just like the blood in the human body. It transfer among departments to

departments and provide vital information that are necessary to run the business

functions.

Keeping in view the preciousness of data, data owners always tends to protect their data

which is vital for their organization. As per rule of “equal and opposite reaction”,

snoopers, on the other hand try to get their hand on such corporate data maliciously to

gain benefits. This results in the continuous war between data owners and data snoopers.

- 3 -

Data owners take measures to protect the confidentiality, integrity and availability of

their data whereas the data snoopers try to compromise these three factors.

It is worth mentioning that around every system built today utilizes protection for data

by preventing snoopers breaking into it [6]. This strategy utilized different means at

different layers of the system to ensure protection at maximum. The crux of this strategy

is to make attacker as far as possible from the data in the system by building various

layers of obstacles in his path to his target. These obstacles includes (but not limited to)

access control mechanisms, network level security, operating system checks, security

policies, runtime / static application code analysis, trusted hardware, various intrusion

detection and prevention systems etc. As security and its breach is a cat-mouse game and

every side try to overcome the measures adopted by the other side hence after

implementation of above mentioned obstacles, incidents of breach of data still occurs.

In today’s hi-tech world, winning the trust of the end user / customer is the key to

winning the game. Incidents of data loss / breached impacts very badly on the effected

organization in two ways. First, they cause the targeted organization huge financial and

reputational loss and second, they brings the organization questionable to the

government against imposed obligations and regulations (as per their industry

requirements) [7].

Data snoopers can broadly be categorized in three major classes.

 Hackers

- 4 -

 Administrators / Insiders

 Government agencies

All three have their own set of intentions and benefits associated with the organization’s

data and can try to access it using their own ways.

Figure 1 Three Dimensions Of Data Snoopers

1.2.2 Multi-Tier Architecture

Enterprise level software implementations results in manipulation of data in so much

quantity and gradually generates the requirements of further research in arena of Big

Data. These requirements later results in advancements in efficient and effective data

Org.

Data

Hackers

Insiders /

Admins

Govt.

- 5 -

storage, retrieval and processing techniques. These techniques were implemented to

fulfill the requirements of high throughput applications.

Multi-tier application architecture is in use in the industry for many years. This

architecture resolves many issues by defining visible boundaries between presentation

layer, business logic layer and the data layer. The adoption of multi-tier architecture

results in updation and modification to these layers independently without majorly

effecting the other layer. Client-server / multi-tier architecture becomes so common soon

after its introduction because of flexibility, ease of use, performance and control provided

by it. Majority of the enterprise grade application in the modern world use the same

strategy of deployment and operations [6].

1.2.3 Cloud Computing

In today’s world, this architecture also proves its success when implemented on the

cloud hosting model. It becomes a defacto standard for web application hosted on the

cloud platform. The effectiveness of this architecture makes it the first choice for

applications that requires high throughput and a large user base. Due to a list of benefits

like no setup / initial infrastructure costs, pay as you grow model, less administrative

overheads, shifting of technical expertise for technical management to cloud provider

and many others, more and more organizations are on the way to adoption of public

- 6 -

cloud computing. However, Cloud based application (specifically built on PaaS) also

brings another dimension to the unintended snoopers. i.e. the cloud server administrators

who may use their server administration privileges to look into the data stored on these

servers outside the control of the enterprises that owns the data.

1.2.4 Government Spying

The third stakeholder that counts against the confidentiality of data is the government

itself. The government regulations and U.S. security agencies are praying on both states

of data i.e. the data in move and data at rest under the companies and infrastructure in

their jurisdiction.

1.2.5 End User Privacy

Besides PaaS, SaaS brings the major revolution on how people utilized web. Most of the

daily web based affairs of persons are now moved to the SaaS offerings by the major

companies. Most of the people are using Gmail, outlook etc. for their email, google drive,

Microsoft OneDrive for their storage, google docs, google sheets, google presentation,

office 360, zoho office etc. for their office related documentation and so on. All these

offerings besides their fantastic list of features, effectiveness and ease of use, prevents the

corporations from utilizing them as their corporate data needs security which is usually

- 7 -

as such absent in these solutions. These offering provides the best security among the

dimension 1 of the data snoopers i.e. hackers but among remaining two, these companies

not only can review the data stored on their servers but in majority of cases, such big

companies like Microsoft, Google, Amazon etc. have close ties with US government

agencies that can access their servers as and when required. Persons usually compromise

over the free offerings of such services on their privacy.

Companies providing various B2C services also stores data of their customers / clients

on their servers. Once compromised, the leakage of the personal information related to

these clients / customers also impacts on the end users who have nothing to do with

either the company or the snoopers but their identity and other related information goes

into wrong hands.

1.3 Objectives of study

This research study is undertaken to identify the current trends in international research

to ensure confidentiality aspect of high throughput / cloud hosted multitier applications

keeping the minimum overhead and achieving the acceptable response time. The main

goal of this research study is to identify the key areas in high throughput data

applications that are critically vulnerable to loss of privacy of data. For this, the target is

to find out and test an implementable solution that provides the required level of security

- 8 -

for such data keeping the operational and functional overhead minimal using industry

standard encryption techniques. Performance measurement and analysis will be done at

both phases i.e. before and after implementation of encryption. This performance

evaluation will be studied and tested on the live cloud using simulated test cases for

verification of its applicability in real world scenarios.

To achieve the above mentioned goal, following are the objectives that were

accomplished in this study.

 Studied ways to introduce data security [confidentiality only] in high throughput

/ cloud based web applications

 Studied different techniques in place for this purpose

 Selected query processing over encrypted data for performance analysis

 Selected CryptDB for research study to implement and check the performance

overhead.

 Implemented it in the use case designed specifically keeping in view the high

throughput requirement of national level medical application like OpenEMR.

 Run the test at the local machine

 Verified initial results from the local infrastructure to get probability of production

scale implementation

 Run the test on Amazon’s EC2 cloud in a production like environment to verify

the actual performance fluctuation due to introduction of confidentiality using

carefully designed use cases testing the best to worst cases.

- 9 -

 Analysis of results and write-up for presentation of the outcomes of this study.

1.4 Scope and Contribution

In this study, we observed the confidentiality aspect of data security in high throughput

/ cloud hosted applications.

The scope of this study is limited to the implementation of selected encryption model for

multi-tier application on a selected use cases of pre-selected enterprise grade medical

data management application and the performance analysis with the base line data which

is gathered from the native application to see the overhead caused by the induced

confidentiality. These use cases then will be used as Proof Of Concept for implementation

of encryption model with minimal implementation, operational and functional overhead.

The contributions made by this study are

 Selection of a multi-tier web application model and addition of security layer for

encryption / decryption

 Selection of the effective confidentiality model to be implemented at the security

layer

 Implementation, assessment and performance analysis of the induced security

layer and its impacts on the response time in production like environment

- 10 -

 Compilation of results and way forward

1.5 Limitations

During this study, following limitations / issues were faced by the researcher.

 Relocation of supervisor to Lahore in the middle of study which shifted the

meetings from face to face to skype. On skype one cannot express the whole work

done after the previous meeting due to the limitation of virtual environment.

 Lack of support by the authors of the CryptDB, a product of MIT due to their

research engagements (email correspondence) due to which a scaled down

customized version based on the original technique devised by the CryptDB

authors was made by the researcher to use as a proof of concept.

 Version changes and fixation of issues in the selected national grade medical

record management application (OpenEMR) results in reworks at some levels.

1.6 Significance of the study

The outcome of this study presents a fully implementable solution tested for its minimal

impact on implementation, operational and functional aspects on an enterprise grade

cloud based environment which ensures confidentiality of data for high throughput /

- 11 -

cloud based applications. The same solution can equally be beneficial for private hosted

application using multi-tier architecture.

1.7 Thesis Outline

This section provides the outline of the rest of the document which is as under.

 Chapter 2 contains discussion about the background, existing methodologies and

work done on the system under study.

 Chapter 3 contains discussion about the proposal of a working model and

technical solution devised for this study.

 Chapter 4 discusses the research methodology and design of experiment for the

research study.

 Chapter 5 discusses tests performed on the proposed model and raw results

achieved during the experiments.

 Chapter 6 contains the conclusion drawn by the results and the outcome achieved

by performing the whole research study.

- 12 -

Chapter 2

LITERATURE REVIEW

2.1 Introduction

Theft of data and breach of security is a commonly seen issue in web based applications.

Even high-tech companies becomes victims of such attacks like NVIDIA Corporation

which losses the user names and password from its server on January 06, 2015 [8]. Cost

effective offerings by cloud computing forces companies to outsource their IT

infrastructures and/or hosting to the cloud providers [9]. This cloud based hosting

provides the curious / malicious administrator a way to snoop on the data. With a slight

modification in a way, the same threat applied on the private cloud as well where an

insider can breach the confidentiality of data. US government agencies having a legal

cover can access the data on the servers resides on the US soils or under the supervision

of companies registered in the United States [3] [10].

An effective approach to minimize the impact of data breach is to encrypt it [11] which

transform the data into the garbage form and the snooper cannot get any benefit from it.

Besides solution to the problem, the encryption brings its own overheads including (but

not limited to) implementation, functional, operational and performance.

- 13 -

In the upcoming sections, we will look into the scenario of today’s enterprise grade high

throughput application general architecture and the ways we can introduce security

through encryption keeping the above mentioned overheads minimal.

2.2 Application Architecture

Multi-tier architecture is an architecture used for implementation of client-server

applications having separate layers for presentation, business logic and database.

Another synonym used for this architecture is n-tier architecture. The physical separation

of the layers results in greater efficiency, greater control and independent updation.

Using multi-tier approach, the designer of the system brings the flexibility and

independence to the developers by segregating the different technical aspects of the

system into different layer. This segregation results in modification and / or addition of

the existing and / or additional layers to cop up the new demands of the system without

interrupting the entire application.

The most popular and widely used internationally among n-tier architecture is the 3 tier

architecture. The sample working principle of 3-tier architecture is shown in the figure

below.

- 14 -

In this model, client interacts with the front end / application logic using software client

which displays the presentation layer of the application. This frontend layer interacts

with the business logic embedded in the application at App Server, which in turn, request

/ suggest the data to the data layer at DB Server as per requirement. Upon receiving the

data query, the DB Server prepares the requested data and sends it to the business logic

unit at App Server, which after carefully checking the fulfillment of requirement of client

request, send the data to the presentation layer which format and display the data in

visual formatted form at the client pc.

2.3 Ensuring Confidentiality Through Encryption

There are two ways of introduction of encryption in a system to ensure its confidentiality.

The traditional approach is simple and straightforward. To keep the data safe from

CLIENT

APP

Figure 2 Multi-Tier Architecture

- 15 -

adversaries, encrypt the data using traditional encryption schemes. As discussed in the

above section, as per multi-tier architecture, data resides in the database server hence to

achieve the goad, one has to implement encryption on the database layer. Now days, all

medium to high end databases provides support of implementation of encryption which

makes implementation of initial encryption for data security very easy using built-in /

custom functions. However, in operational point of view, it is neither providing required

security nor it is very efficient in terms of functionality. In terms of security, the built-in

functions requires provision of key and data for the encryption. Anyone having access to

the query log (Admins of the DB server) may results in leakage of the encryption keys

and compromise of data confidentiality due to the key leakage. It is also not an efficient

technique. If we consider an example of a database table containing details about the

employees. When a user issues a query to see the name and designation of those

employees whose monthly salary is more than or equal to 10,000/-, in unencrypted

environment, the database make use of indexes to quickly scan the table for this query.

In a logarithmic number of operations as per the number of rows in the table, the database

server finds the required results and pass them on to the user. Now consider the

traditional encryption approach, as all values in the database table are encrypted so the

query has to scan the whole table, get the results, decrypt the result, and then filter them

as per the requirement of query and then send these results back to the DB server.

Another approach is to implement functionality of query processing over encrypted data

in the solution as this approach is far more practical, provides high degrees of security

- 16 -

and promises efficient functionality. Homomorphic and functional encryption

techniques provides the way to do so. In next sections, we will discuss the overview and

current trends in the literature regarding query processing over encrypted data with

focus on implementable solutions.

2.4 Homomorphic Encryption

The type of encryption that allows performing computations on cipher text is known as

homomorphic encryption. Upon decryption, the result matches with the results

performed in unencrypted data. This concept was introduced by MIT researchers in 1978

under the title of privacy homomorphism [12]. Many novel approaches emerged since

then to process the queries over the encrypted data to get efficiency in terms of time and

space requirement as compared to the original idea.

The significant work done later in this regard was to search using keywords associated

with encrypted data [13].

2.4.1 Fully Homomorphic Encryption [FHE]

A crypto system that allows arbitrary set of operations on the encrypted data without

exposing any information about the plaintext underneath is known as fully

- 17 -

homomorphic cryptosystem. The fully homomorphic encryption scheme is based on a-

symmetric (public key encryption) cryptography. This techniques guarantees semantic

security [14]. Other than the person having the private key, even the presence of public

key does not reveal anything about the plaintext data from the cipher text.

In this system, a key generation algorithm is used to setup a pair of keys i.e. public and

private. Then, public key is applied on the plaintext to get the cipher text which can only

be decrypted by using the private key of the same pair. Currently, many FHE schemes

were designed by the researchers which are capable of handling execution and

calculation of almost all operations on the encrypted data [15] [16] [17] [18] [19] [6]. The

FHE theoretically provides the semantic security which requires that the malicious user

having a cipher text and the public key used to encrypt that text should be unable to get

any information about the under laying plain text except its length.

It takes almost 30 years to come up with a workable FHE scheme which was presented

by Craig Gentry in 2009. In his research work, the implementation of “Full Homomorphic

Encryption” technique [15] was designed and later tested which allows execution of

various computations directly over the encrypted data. This technique is quiet marvelous

but still due it magnitude order cost of execution, it is not implemented in production

environment beside its ensured confidentiality. Since its introduction, many variants of

the original scheme were proposed which bring improvements in the original scheme but

- 18 -

it still in magnitude order slower which prevents its production level implementation in

the real world [6].

2.4.2 Partial Homomorphic Encryption (PHE)

The systems falls under partial homomorphic encryption (PHE) are the one that support

some operation(s) to be performed on the cipher text like addition, multiplication,

quadratic functions etc. Some examples of PHE are BGN crypto system [20], El Gamal

[21], Paillier [22] and Goldwasser-micali [23] etc. They provide the semantic security with

the exception that they support only specific function to be computed over encrypted

data. Their performance is better than the performance of existing FHE schemes [6].

2.4.3 Functional Encryption (FE)

Computation over encrypted data using Functional Encryption (FE) schemes results in

controlled leakage of the information related to the function applied on the cipher text.

This small information leakage drastically improves the performance of the system up to

many folds. In any case, the data encrypted is no revealed in any form. The concept was

initially given by Amit Sahai and Brent Water in their paper “Fuzzy Identity Based

Encryption” back in 2005 [24]. Keeping in view the performance gain achieved by the

controlled leakage of information required for the computation, many functional

- 19 -

schemes were designed to compute specific functions over the encrypted data efficiently

in terms of compute, time and space [25] [26] [27].

2.5 Current Frameworks

In this phase of study, those frameworks from the existing literature were studied that

are providing the system for implementation of query processing over encrypted data.

Due to limitation of time, the study scope was narrowed to the systems that provides the

implementable framework for real world implementation for performance analysis to

test the suitability in production like scenarios. Later, one among them which provides

the balance of ease of implementation versus efficiency and security provided is chosen

for test implementation. Below is the details of these frameworks.

2.5.1 CryptDB

A breakthrough research approach conceived by the researchers at MIT results in

creation of CryptDB [11], a system that provides an implementable way for

implementation of confidentiality over the databases transparently and allows execution

of queries over encrypted data with minimal overhead.

- 20 -

CryptDB is a SQL based implementation and hence can be installed with all major SQL

based DBMS. CryptDB design was based on the two fundamental ideas. One ability to

perform execution of SQL queries over encrypted data, two, adjustment in the encryption

of data as per requirement both in terms of security as well as functionality.

Figure 3 CryptDB Architecture [9]

2.5.2 Monami

Monami is another system developed for query processing over encrypted data. Monami

specifically targets the OLAP (online analytical processing) databases [9]. The key feature

of Monami is that it processes the query on the encrypted data in two modes. It splits the

complex queries and run most of the queries at the database end. But databases supports

only few among all queries to be processed on encrypted data. Monami converts such

unsupported complex queries in such a way that it split and execute the supported

portion of the query on database and run the remaining portion of the same query on the

client side.

- 21 -

Figure 4 Overall Architecture Of Monami [6]

2.5.3 Mylar

Mylar is also the framework allows query processing over encrypted data [28]. Mylar

encrypts the data when it is being written on the server and decrypts it in the end user

browser. It was still in the research phase [the first paper on Mylar was presented in

usenix’s nsdi14] however promises given by Mylar are fascinating. Mylar promises

keyword searches on the encrypted data using the Multi-Key Searchable Encryption [29].

The beauty of this web application development platform is it can search on encrypted

data which is encrypted by different keys.

- 22 -

Figure 5 Mylar Architecture [13]

2.5.4 Selection for test implementation

For this study, due to limitation of time, we have to select one among them for

implementation and performance analysis for enterprise grade implementation over the

cloud. All of the above solutions were gauged for the following factors

Ease Of Implementation: For this aspect, the solutions were gauged for how complex or

easy they are for implementation in real world scenario. Quantification was done from

the scale 1 to 10 under the following classification

o Minimal Efforts at implementation Time = 8-10 Marks

o Medium to Minimum effort at implementation time =5-8 Marks

o Minimum efforts at design time = 3-5 Marks

o Medium to Minimum efforts at design time = 0-3 Marks

- 23 -

Below are the results for ease of implementation.

Ease Of Implementation (10)

CryptDB (10)

Mylar (10)

Monami (10)

Data Tier Changes (-1 to -3)

 (-02)

Application Tier Changes (-1 to -3)

 (-00)

 (-02)

Client Tier Changes (-1 to -3)

 (-00)

 (-02)

 (-01)

Other Tier Changes (-1)

 (-01)

Results

Table 1 Ease Of Implementation Provided By Systems Under Evaluation

Range Of Security: For this aspect, the solutions were gauged for how much security

they provide and how many threat levels they nullify after implementation i.e. client side,

web server side and database server side. . Quantification was done from the scale 1 to

10 under the following classification.

o Support of maximum set of functions over encrypted data = 8-10 Marks

o Support of medium set of functions over encrypted data =5-8 Marks

o Support of minimum set of functions over encrypted data = 3-5 Marks

o No to minimum support for functions over encrypted data = 0-3 Marks

Below are the results for range of security.

Range Of Security (00)

CryptDB

Mylar

Monami

DB Level (+04)

 (04)

Application Level (+02)

 (00)

 (02)

 (00)

Client Level (+02)

 (00)

 (02)

 (00)

Other (+01)

 (01)

Multi Key Search Support (+01)

 (01)

Result (10)

- 24 -

Table 2 Range Of Security Provided By Systems Under Evaluation

Range of functionality: for this aspect, the solution were gauged for how much

functionality they provide / support for the prime target of query processing over

encrypted data. . Quantification was done from the scale 1 to 10 under the following

classification.

o Support Confidentiality of columns that are not processed during query

processing. [Basic requirement for encrypted database] = 1 Mark

o Support equality check in queries to be processed over encrypted data.

[support select queries with equality checks, equality joins, DISTINCT,

GROUPBY and COUNT] = 1 Mark

o Support order processing in queries over encrypted data. [support filtering

data on ranges, sorting using ORDERBY, MINIMUM, MAXIMUM] = 2

Marks

o Support functions like SUM / addition [Support for equation in the query

to be processed over encrypted data] = 2 Marks

o Support JOIN on encrypted column [Support for relationship over

encrypted data] = 2 Marks

o Support word search in text column [Support queries processing containing

LIKE or EQUALITY on text column containing encrypted data] = 2 Marks

- 25 -

Below are the results for Range of Functionality

Range Of Functionality (00)

CryptDB (00)

Mylar (00)

Monami (00)

Non Processed Col. Conf. (+01)

 (01)

Equality Check (+01)

 (01)

Order Preserving (+02)

 (02)

 (00)

 (02)

Sum (+02)

 (02)

 (00)

 (02)

Join (+02)

 (02)

 (00)

 (02)

Like / Keyword Search (+02)

 (02)

Results

Table 3 Range Of Functionality Provided By Systems Under Evaluation

Below are the results for the above mentioned measurements.

Type

CryptDB (00)

Mylar (00)

Monami (00)

Ease (0.4)

06x0.4=2.4

03x0.4=1.2

04x0.4=1.6

Functionality (0.3)

10x0.3=3.0

04x0.3=1.2

10x0.3=3.0

Security (0.3)

06x0.3=2.4

10x0.3=4.0

06x0.3=2.4

Results

7.2

6.4

6.6

Table 4 Comparison Of Solutions

Based on the above mentioned results, we selected CryptDB as our test implementation

product details of which are in upcoming chapter.

- 26 -

Chapter 3

CRYPTDB

3.1 Introduction

In this chapter, technical solution for this study will be discussed. This technical solution

was made for limited scope proof of concept only to verify the large scale implementation

of the concepts gained through this study.

3.2 Security in Multi-Tier Architecture Using CryptDB

Multi-tier application architecture is in use in the industry for many years. This

architecture resolves many issues by defining visible boundaries between presentation

layer, business logic layer and the data layer. They are commonly used in enterprise

grade application hosted on private infrastructure as well as in the cloud. The same

model was adopted for design of experiment for this research study.

In the solution model, another server named CRYPTDB is placed between the application

server and the database server as shown in the figure.

- 27 -

This CryptDB server intercepts the traffic between web server and database server and

acts on the two way traffic to introduce security layer using the following basic rules.

 Insert Query: This CryptDB will receive the insert queries from the application

server, rewrite them after encrypting data values using a preselected key and send

them to the database server to write encrypted data in the relevant tables.

 Select Query : Upon retrieval of data, the CryptDB intercepts the select query,

dissect it, and perform operation on it based on the below mentioned rules

o If it is a simple select query, the CryptDB forwards the query to the

database, get the data returned, decrypt it with the pre-selected key and

forward the query to the database server. Upon receiving the encrypted

results, it decrypts the results and send the data to the webserver.

o If it contains a where clause and match contain straight equal to condition,

it encrypt the condition value with the key to rewrite them to demand data

CLIENT

APP

CRYPTDB

Figure 6 Model For Design Of Experiment For This Research Study

- 28 -

from the server in the encrypted form, decrypt it and send the required data

to the application server for onward submission to client.

o If it contains a where clause and match contains complexity e.g. like, greater

than, less than etc., then the sentry demands the whole data from the table

for the specific column (s) used in the where clause, decrypt them, perform

a match function, note down the row id(s), retrieve the selected columns for

these row ids, decrypt them and send the data to the webserver.

 Delete Query : Upon deletion of data, the CryptDB intercepts the delete query,

dissect it, and perform operation on it based on the below mentioned rules

o If it is a simple delete query, the CryptDB forwards the query to the

database for deletion of the data from the table.

o If it contains a where clause and match contain straight equal to condition,

it encrypt the condition value with the key to rewrite them to delete data

from the server.

o If it contains a where clause and match contains complexity e.g. like, greater

than, less than etc., then the sentry demands the whole data from the table

for the specific column (s) used in the where clause, decrypt them, perform

a match function, note down the row id(s) and then delete the rows form

the table matching these identified row id(s).

- 29 -

3.3 Query Execution and Data Confidentiality

The power of crypt lies in enabling DBMS to execute the queries on the encrypted data

in the same way as it executes them on unencrypted data. The beauty of this method is

that there is no need to bring changes in the existing applications. This is the major ease

by which we can transparently implement the confidentiality in application at the time

of their installation / configuration.

The CryptDB server stores the details about the secret MASTER KEY (MK), the meta data

about the schema of the application’s back end and the current security stance of the

columns in that schema. During installation / configuration, CryptDB intercepts the

creation of schema and replace the tables and fields names to something that human can

easily read. This results in security of the schema as the person having database server

and app server access does not know the mapping of the fields and tables. It also add

some tables / columns in the existing tables to maintain the encryption management.

Some user defined functions (UDF) are also added to enable the DBMS compute over

encrypted data.

The system flow of working of CryptDB is as under.

- 30 -

3.3.1 CryptDB Scope For Confidentiality

CryptDB only ensures confidentiality of the data on database server. Its scope does not

include integrity or availability of data. It also targets the protection of only database

server from compromise. In term of confidentiality, the CryptDB provides fool proof

security against the database administrator or the snooper having the full control over

the database server. Hence it ensures the safety of data in times of DB server compromise.

Figure 7 System Flow Of CryptDB

App Server Issues

Query

CryptDB Server

Intercepts Query

CryptDB Change

table / column

name in query

CryptDB encrypts

the constants in

the query

CryptDB Checks

if change in onion

is required

CryptDB Issues

update query for

onion adjustment

CryptDB Forward

the query to DB

Server

DB Server

executes the

query

DB Server

Returns the

encrypted results

CryptDB decrypts

the encrypted

results

CryptDB rewrites

the new

encrypted query

CryptDB Forward

the results to App

Server

YES

- 31 -

The CryptDB provides security guarantee of data against threats like compromise of

DBMS software, when data snooper succeeded to gain root level to the database server

machine, from the database administrator or any other entity trying to access the data

through database management system and even if someone has access to physical RAM

of the database server. It also provides you data safety in case of hosting of database

server on the third party cloud infrastructure. The guarantees provided by CryptDB

includes data contents, names of tables and names of columns.

CryptDB encrypts the data in two ways. The Data Owner can classify his / her data

columns in the table in two categories, i.e. SENSITIVE and BEST-EFFORT. For columns

declared SENSITIVE, CryptDB ensures semantic security. Due to this, functionality of

query processing over encrypted data becomes limited on these column but the security

becomes strongest. For column declared BEST-EFFORT, CryptDB ensure the MAXIMUM

security keeping in view the queries operations on that columns. Here the functionality

of query processing over encrypted data for these column becomes fully enabled but it

might decreases the semantic security and leaks some kind of information about the

contents of these columns as per the query requirement.

3.3.2 CryptDB Encryption Schemes

- 32 -

Following are the encryption scheme used by CryptDB for ensuring confidentiality of

data.

3.3.2.1 RND Scheme

RND (Random) encryption scheme implements block ciphers (AES) in CBC mode [30]

with random IV except in case when the targeted column is in integer format. For integer

column it uses Blow Fish algorithm to save the space as AES uses 128 bit block whereas

Blow Fish uses 64 bit block. Columns secured with RND provides the best security for

data contents but does not allow any computation over encrypted data.

3.3.2.2 DET Scheme

DET (Deterministic) encryption scheme implements block ciphers (AES) using a variant

of CMC mode [30] with zero IV except in case when the targeted column is in integer

format. For integer column it uses Blow Fish algorithm to save the space as AES uses 128

bit block whereas Blow Fish uses 64 bit block. The requirement is to get the same cipher

text for same data in the column. This property enables database server to compute the

equality, perform joins and calculating GROUP BY, COUNT and DISTINCT etc.

3.3.2.3 OPE Scheme

- 33 -

OPE (Order Preserving Encryption) scheme [31] preserves the order relationship of the

data items while encrypting them. The algorithms of OPE states that if x < y, then OPEk(x)

< OPEk(y) where k is any secret key to encrypt the data. Data columns encrypted using

OPE enables the functionality of SORT, MIN, MAX, ORDER BY etc. to be executed on

encrypted data.

3.3.2.4 Paillier Cryptosystem

Paillier cryptosystem [22] is a homomorphic encryption scheme that allows the

computation over encrypted data. Implementation of this scheme results in enabling the

functionality of calculating averages (AVG) and queries requires computation like salary

= salary + 100 etc.

3.3.2.5 SEARCH

SEARCH scheme is used for word search functionality in the CryptDB [13]. Columns

encrypted with SEARCH results in enabling the functionality of executing queries

containing LIKE operator against such queries. However this functionality is limited to

the full word. It cannot compute like on partial words. The SEARCH is nearly as secure

as RND level.

- 34 -

Chapter 4

RESEARCH METHODOLOGY

4.1 Introduction

In this chapter, technical solution for this study will be discussed. This technical solution

was made for limited scope proof of concept only to verify the large scale implementation

of the concepts gained through this study. Details about application scenario, phases of

experiments, details about experiments performed, workload generation and evaluation

criteria are discussed in upcoming sections.

4.2 Sample Application Scenario

One of the key examples of high throughput application is the national level / enterprise

level medical health management system. Many countries have already such

infrastructure implemented and operational in which every citizen is registered and

whenever any medical treatment is given to him / her, it was recorded accordingly to

have a complete medical history for later ailment treatment keeping in view the past

history. On the other hand, other stakeholders of the medical system like hospitals, labs,

pharmacists and insurance companies can also use this system in other dimensions to

- 35 -

fulfill the medical services and financial services provisioning to the patient or other

stakeholders. The upfront benefits of such implementations are (not limited to) as follow.

 Patients can take care of their patient history and have all the record in place in

electronic format and radially available when required by the doctors

 Doctors have complete picture of past medical life of the patients in front of them

to make them able to take better decisions regarding selection of treatments

keeping in view the current and past history of diseases and allergies etc.

 Drug control can be implemented as the pharmacists issue the drugs only to the

patients after receiving online prescription by the doctors and duly issue the

medicines accordingly.

 Lab test management can be easily done as test requests and later test results were

readily available to the patient and doctors as and when required.

 Insurance companies can provide the medical insurance services to their

subscribers in an efficient and transparent way where after immediate verification

about validity of the patient’s medical insurance coverage, the hospital start

providing the medical services to the patient and forward the bills to insurance

company’s account for later clearance.

 Government can keep track of ailment and diseases propagation patterns and act

accordingly to device and implement the national health policies to better serve

their citizens.

- 36 -

 National / local level disease identification, spread / control graphs based on the

cases reported for particular disease

 Where researchers works on the medical history of the patients of specific disease

and test effectiveness of different medicines against that disease to study the

success ratio

4.3 Test Scenario

High throughput applications like medical and health record management are widely

used in the developed countries for management of health records of the citizens of these

countries. This approach is expected to be followed by the nations that have rolled out

information and communication technology infrastructure throughout in their vicinity.

Medical records are the mirror of personal attributes and conditions of any patient and

needs a confidential handling and careful access control. Beside this privacy control, on

the other hand, medical records are also used for various research and development in

the field of health services provisioning at national scale.

Encryption can used in such applications initially for achieving the goal of confidentiality

of data. However, it can later be used for controlling the access to the data by sharing of

encryption keys with only the entities that have requirement of access to such data.

- 37 -

Introduction of encryption comes with additional processing overhead both at the time

of writing as well as reading of data and this overhead prevents high through put

application like medical and healthcare record management systems of enterprise /

national level to not to implement this feature to retain the acceptable performance of

their system.

4.3.1 Technologies Used During Experiments

Keeping in view the above mentioned facts, Open EMR [32] was selected as test

implementation application. OpenEMR is an electronic health record management

system with a complete coverage of dimension related to an integrated Health

Management System. The technologies behind the test scenario setup are Open source

i.e. LAMP (Linux, Apache, MySQL, PHP). All the server are on Ubuntu Server operating

system. PHP is the language used for coding the test scenarios. Back end data is stored in

MySQL database.

4.3.2 Implementation Scenarios

Four experimental implementation scenarios were selected to test and compare the

results which are as under.

- 38 -

Scenario Name

Details

Remarks

Experiment-01

One Virtual Machine

Application server and database

server on same machine with no

CryptDB

Experiment-02

Two Virtual Machines,

One for Web Server &

One For DB Server

Application server and database

server on separate machines with

no CryptDB

Experiment-03

Two Virtual Machines,

One for Web Server &

One For DB Server

Application server and database

server on separate machines with

CryptDB installed on DB server

Experiment-04

Three Virtual Machines,

One each for Web Server,

DB Server and CryptDB

Application server, database server

and CryptDB on separate machines

Table 5 Implementation Scenarios

4.4 Test Execution

Experiments were planned and executed in two phases details of which are as under.

4.4.1 Experiment On Local Test Bed

In first phase, the model mentioned in the figure 06 was implemented on local machines

to test and validate its functionality. Three servers were prepared for testing of initial

three experiments. Due to limitation of available resources on local infrastructure, test

were not performed on local machines using experiment 04 in this phase.

- 39 -

The webserver in this test phase was implemented using apache server [33]. The database

used for this phase is MySQL [34] which is the widely used open source database

management system. The CryptDB was implemented as a POC using MySQL proxy and

the coding was done using LUA programming language. OpenEMR [32], an open source

enterprise grade electronic medical record management system was selected as a sample

test implementation to implement data security. Fabricated records of 1,00,000 (One Lac)

patients were generated and inserted into the databases of all three experiments. For

virtual load generation to simulate users, we used JMeter [35] which is a very popular

open source tool for performance measurement.

Our pilot testing infrastructure in phase 1 was built on virtual machines on a Core i5

system with 4 GB RAM and 128GB SSD. 03 machines were configured for the test to host

web server, database server and user load generator with the following specifications.

Sr. #

Machine

vCPU

RAM

Hard Disk

Web Server

1GB

8GB SSD

DB Server

1GB

8GB SSD

User Load Generator

1GB

8GB SSD

Table 6 Specifications of Machines Used In Phase 01

4.4.2 Experiments On Amazon Cloud

In second phase, the model mentioned in the figure 06 was implemented on Amazon

cloud (EC2). During the test, eight servers were prepared for testing of these scenario

- 40 -

keeping the machines with different configurations for inter comparison between the

server infrastructures.