ThesisPDF Available

COST & PERFORMANCE EVALUATION OF DATA CONFIDENTIALITY IN HIGH THROUGHPUT/CLOUD BASED MULTI-TIER APPLICATIONS

Authors:
  • Air University

Abstract and Figures

Cloud computing attracts a large number of users to host their applications and data mainly due to on-demand resource provisioning and pay-as-you-go features. Web applications are one of the important types of applications that are deployed over the cloud. However, the owners of the application are concerned about their data privacy and security. One of the key techniques to ensure data security (confidentiality aspect only) is encryption and decryption; however, it introduces overhead in the performance of the application. From the end user’s point of view, response time is one of the main performance metrics. In this thesis, we study the possible mechanism to ensure the data privacy and security concerns of the owners of the cloud-hosted applications without requiring to modify applications code. We identified that CryptDB is one of the possible solutions to integrate with web applications without requiring to modify the code. CryptDB claims to provide confidentiality over the databases and allows the execution of queries over encrypted data with minimal overhead. In this thesis, we perform cost and performance analysis of usingCryptDB with a multi-tier web application hosted on Amazon cloud using different configurations. Our experimental evaluation shows that a specific response time is possible to provide for a large number of users however a substantial increase in the cost of upgrading the infrastructure brings up to a 40% gain in performance if required as per the need of the organization.
Content may be subject to copyright.
COST & PERFORMANCE EVALUATION OF
DATA CONFIDENTIALITY IN HIGH
THROUGPUT/CLOUD BASED MULTI-TIER
APPLICATIONS
By:
Mr. Faisal Shahzad
UET-11F-MSIS-CASE-04
Supervisor
Dr. Waheed Iqbal
DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING
CENTRE FOR ADVANCED STUDIES IN ENGINEERING
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
- i -
TAXILA
Semester Fall 2014
- ii -
COST & PERFORMANCE EVALUATION OF DATA
CONFIDENTIALITY IN HIGH THROUGPUT/CLOUD
BASED MULTI-TIER APPLICATIONS
A report submitted in partial fulfillment of the requirements for the M.Sc.
Thesis
By:
Mr. Faisal Shahzad
UET-11F-MSIS-CASE-04
Approved by:
_____________________
Supervisor:
Dr. Waheed Iqbal
_____________________
External Examiner:
Dr. Zia Ud Din
DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING
CENTRE FOR ADVANCED STUDIES IN ENGINEERING
UNIVERSITY OF ENGINEERING AND TECHNOLOGY TAXILA
Semester Fall 2014
- iii -
TABLE OF CONTENTS
TABLE OF CONTENTS ............................................................................................................. iii
DECLARATION........................................................................................................................ viii
DEDICATION ............................................................................................................................. ix
ACKNOWLEDGEMENT ............................................................................................................ x
PROFILES ..................................................................................................................................... xi
Supervisor (Dr. Waheed Iqbal) .............................................................................................. xi
External Supervisor (Dr. Zia Ud Din) .................................................................................. xi
Members Research Committee (Dr. Farrukh Kamran) ...................................................... xi
Members Research Committee (Dr. Shafaat Ahmed Bazaz) ............................................ xi
LIST OF ABBREVIATIONS AND ACRONYMS ................................................................... xii
ABSTRACT ................................................................................................................................ xiii
EXECUTIVE SUMMARY ......................................................... Error! Bookmark not defined.
LIST OF FIGURES ..................................................................................................................... xiv
LIST OF TABLES ........................................................................................................................ xv
Chapter 1 ....................................................................................................................................... 1
DATA SECURITY IN HIGH THROUGHPUT/CLOUD BASED APPLICATION ............ 1
1.1 Introduction ........................................................................................................................ 1
1.2 Motivation ........................................................................................................................... 2
1.2.1 Need For Data Protection ........................................................................................... 2
1.2.2 Multi-Tier Architecture ............................................................................................... 4
1.2.3 Cloud Computing ........................................................................................................ 5
- iv -
1.2.4 Government Spying .................................................................................................... 6
1.2.5 End User Privacy ......................................................................................................... 6
1.3 Objectives of study ............................................................................................................. 7
1.4 Scope and Contribution ..................................................................................................... 9
1.5 Limitations ......................................................................................................................... 10
1.6 Significance of the study ................................................................................................. 10
1.7 Thesis Outline ................................................................................................................... 11
Chapter 2 ..................................................................................................................................... 12
LITERATURE REVIEW ............................................................................................................. 12
2.1 Introduction ...................................................................................................................... 12
2.2 Application Architecture ................................................................................................. 13
2.3 Ensuring Confidentiality Through Encryption............................................................ 14
2.4 Homomorphic Encryption .............................................................................................. 16
2.4.1 Fully Homomorphic Encryption [FHE] ................................................................. 16
2.4.2 Partial Homomorphic Encryption (PHE) ............................................................... 18
2.4.3 Functional Encryption (FE) ...................................................................................... 18
2.5 Current Frameworks ........................................................................................................ 19
2.5.1 CryptDB ...................................................................................................................... 19
2.5.2 Monami ....................................................................................................................... 20
2.5.3 Mylar ........................................................................................................................... 21
2.5.4 Selection for test implementation ............................................................................ 22
Chapter 3 ..................................................................................................................................... 26
CRYPTDB .................................................................................................................................... 26
3.1 Introduction ...................................................................................................................... 26
- v -
3.2 Security in Multi-Tier Architecture Using CryptDB ................................................... 26
3.3 Query Execution and Data Confidentiality .................................................................. 29
3.3.1 CryptDB Scope For Confidentiality ........................................................................ 30
3.3.2 CryptDB Encryption Schemes ................................................................................. 31
3.3.2.1 RND Scheme ....................................................................................................... 32
3.3.2.2 DET Scheme ........................................................................................................ 32
3.3.2.4 Paillier Cryptosystem ........................................................................................ 33
3.3.2.5 SEARCH .............................................................................................................. 33
Chapter 4 ..................................................................................................................................... 34
RESEARCH METHODOLOGY ............................................................................................... 34
4.1 Introduction ...................................................................................................................... 34
4.2 Sample Application Scenario .......................................................................................... 34
4.3 Test Scenario ..................................................................................................................... 36
4.3.1 Technologies Used During Experiments ................................................................ 37
4.3.2 Implementation Scenarios ........................................................................................ 37
4.4 Test Execution ................................................................................................................... 38
4.4.1 Experiment On Local Test Bed ................................................................................ 38
4.4.2 Experiments On Amazon Cloud ............................................................................. 39
4.5 Test Use Cases ................................................................................................................... 41
4.5.1 Test Bed Environment ............................................................................................... 41
4.5.2 Client Side Load ......................................................................................................... 42
Chapter 5 ..................................................................................................................................... 43
EXPERIMENTS & RESULTS .................................................................................................... 43
5.1 Introduction ...................................................................................................................... 43
- vi -
5.2 Experiments On Local Test Bed ..................................................................................... 43
5.2.1 Experiment-01 On Local Test Bed ........................................................................... 45
5.2.2 Experiment-02 On Local Test Bed ........................................................................... 46
5.2.3 Experiment-03 On Local Test Bed ........................................................................... 47
5.2.4 Comparison Of Means Of Experiments On Local Test Bed ................................ 48
5.3 Experiments On Amazon Cloud .................................................................................... 49
5.3.1 Experiment-01 On Amazon Cloud .......................................................................... 50
5.3.2 Experiment-02 On Amazon Cloud ..................................................................... 51
5.3.3 Experiment-03 On Amazon Cloud ..................................................................... 52
5.3.4 Experiment-04 On Amazon Cloud ..................................................................... 53
5.3.4 Comparison Of Means Of Experiments On Amazon Cloud .......................... 54
5.3.5 Comparison Of Means Of Extended Experiments On Amazon Cloud ........ 55
5.4 Upgradation Of Cloud Infrastructure ........................................................................... 56
5.4.1 Upgraded Test Infrastructure .................................................................................. 56
5.4.2 Performance Gain Using Light Load ...................................................................... 57
5.4.2 Performance Gain Using Medium Load ................................................................ 58
5.4.3 Performance Gain Using Heavy Load .................................................................... 60
Chapter 6 ..................................................................................................................................... 63
CONCLUSION & RECOMMENDATIONS ........................................................................... 63
6.1 Introduction ...................................................................................................................... 63
6.2 Conclusion ......................................................................................................................... 63
6.3 Recommendations ............................................................................................................ 65
6.4 Future Work ...................................................................................................................... 66
APPENDICES ............................................................................................................................. 67
- vii -
A - SERVER INFRASTRUCTURE LOCAL TEST BED .................................................. 67
B - SERVER INFRASTRUCTURE CLOUD TEST BED .................................................. 67
C - SERVER INFRASTRUCTURE CLOUD TEST BED (EXTENDED) ......................... 67
D PHASE 01 EXPERIMENTS RESULTS .......................................................................... 68
E PHASE 02 (CLOUD BASED) EXPERIMENTS RESULTS .......................................... 70
F PERFORMANCE GAIN AFTER UPGRADE ................................................................ 72
G PERFORMANCE GAIN vs COST OF UPGRADE ...................................................... 74
REFERENCES ............................................................................................................................. 75
- viii -
DECLARATION
The substance of this thesis is the original work of the author and due
references and acknowledgements have been made, where necessary, to the
work of others. No part of this thesis has been already accepted for any
degree, and it is not being currently submitted in candidature of any degree.
_______________
Mr. Faisal Shahzad
UET-11F-MSIS-CASE-04
M.Sc. Thesis Scholar
Countersigned:
______________
Dr. Waheed Iqbal
Thesis Supervisor
- ix -
DEDICATION
To the knowledge,
And to those who are continuously struggling
To add in it
To spread it
To make people like us able to understand it
Love You All
- x -
ACKNOWLEDGEMENT
A special thanks to Allah Almighty, His kindness and blessings spreads over my whole life that
makes every achievement in my life possible for me. Alhamdulillah.
I would like to express my special appreciation and thanks to my advisor Professor Dr.
Waheed Iqbal, you have been a tremendous mentor for me. I would like to thank you for
encouraging my research and for allowing me to grow as a research student. Your advices have
been priceless.
I would also like to thank my committee members, Professor Dr. Furrukh Kamran,
Professor Dr. Zia Ud Din, and Professor Dr. Shafaat A Bazaaz for serving as
my committee members. I would especially like to thank CASE management for their kind
support specially Mr. Zeeshan Saleem for his kind guidance and support as and when required.
A special thanks to goes to my family. Words cannot express how grateful I am to you all.
Your prayer for me was what sustained me thus far.
- xi -
PROFILES
Supervisor (Dr. Waheed Iqbal)
Ph.D., Cloud Computing, 2012 Asian Institute of Technology, Thailand
M.Eng., Computer Science, 2009 Asian Institute of Technology, Thailand and Technical
University Catalonia, Spain
B.S., Software Engineering, 2005 Bahria University Karachi Campus, Pakistan
External Supervisor (Dr. Zia Ud Din)
Postdoctoral Research, Computer Science, Feb 2012-Aug 2012, University of Nice, Sophia
Antipolis, France
Ph.D., Computer Science, 2009 Asian Institute of Technology, Thailand
M.S., Computer Science, 2003 Bahria University, Islamabad Campus, Pakistan
B.Eng., Civil Engineering, 2000 UCET, Baha Uddin Zakariya University, Multan,
Pakistan
Members Research Committee (Dr. Farrukh Kamran)
Ph.D., Electrical Engineering, 1995 Georgia Institute of Technology, Atlanta, GA USA
M.S., Electrical Engineering, 1992 Georgia Institute of Technology, Atlanta, GA USA
B.Sc. (Eng.), Electrical Engineering, University of Engineering & Technology, Lahore
Members Research Committee (Dr. Shafaat Ahmed Bazaz)
Ph.D., Controls and Computer Sciences, 1998 Institute National des Sciences Appliqués
(INSA) Toulouse, France
M.S., 1994 Université de Franche Comté, Besanc¸ France
B.S., 1989 NED University of Engineering and Technology, Karachi, Pakistan
- xii -
LIST OF ABBREVIATIONS AND ACRONYMS
Abbreviation
Details
AES
Advance Encryption Standard
DB
Database
DBMS
Database Management System
FHE
Full Homomorphic Encryption
LAMP
Linux, Apache, MySQL, PHP
OLAP
Online Analytical Processing
OLTP
Online Transaction Processing
PHE
Partial Homomorphic Encryption
SQL
Structured Query Language
- xiii -
ABSTRACT
Cloud computing attracts a large number of users to host their applications and data
mainly due to on-demand resource provisioning and pay-as-you-go features. Web
applications are one of the important types of applications that are deployed over the
cloud. However, the owners of the application are concerned about their data privacy
and security. One of the key technique to ensure data security (confidentiality aspect
only) is encryption and decryption, however, it introduces overhead in the performance
of the application. From the end user’s point of view, response time is one of the main
performance metrics.
In this thesis, we study the possible mechanism to ensure the data privacy and security
concerns of the owners of the cloud hosted applications without requiring to modify
application’s code. We identified that CryptDB is one of the possible solution to integrate
with web applications without requiring to modify the code. CryptDB claims to provide
confidentiality over the databases and allows execution of queries over encrypted data
with minimal overhead. In this thesis, we perform cost and performance analysis of using
CryptDB with a multi-tier web application hosted on Amazon cloud using different
configurations. Our experimental evaluation shows that a specific response time is
possible to provide for a large number of users however a substantial increase in the cost
by upgrading the infrastructure brings up to 40% gain in performance if required as per
the need of organization.
- xiv -
LIST OF FIGURES
Figure 1 Three Dimensions Of Data Snoopers ........................................................................ 4
Figure 2 Multi-Tier Architecture .............................................................................................. 14
Figure 3 CryptDB Architecture [9] .......................................................................................... 20
Figure 4 Overall Architecture Of Monami [6] ....................................................................... 21
Figure 5 Mylar Architecture [13] ............................................................................................. 22
Figure 6 Model For Design Of Experiment For This Research Study ................................ 27
Figure 7 System Flow Of CryptDB .......................................................................................... 30
Figure 8 Experiment 01 On Local Test Bed ............................................................................ 45
Figure 9 Experiment 02 On Local Test Bed ............................................................................ 46
Figure 10 Experiment 03 On Local Test Bed .......................................................................... 47
Figure 11 Comparison of MEAN of experiments performed on Local Test bed .............. 48
Figure 12 Experiment 01 On Cloud Test Bed ......................................................................... 50
Figure 13 Experiment 02 On Cloud Test Bed ......................................................................... 51
Figure 14 Experiment 03 On Cloud Test Bed ......................................................................... 52
Figure 15 Experiment 04 On Cloud Test Bed ......................................................................... 53
Figure 16 Comparison Of Mean Of Experiments On Cloud Test Bed ............................... 54
Figure 17 Comparison Of Mean Throughput - Extended Test Cases ................................ 55
Figure 18 Performance Gain After Upgrade - Light Load ................................................... 57
Figure 19 Performance Gain vs Cost - Light Load ................................................................ 58
Figure 20 Performance Gain After Upgrade - Medium Load ............................................. 59
Figure 21 Performance Gain vs Cost - Medium Load .......................................................... 60
Figure 22 Performance Gain After Upgrade - Heavy Load ................................................. 61
Figure 23 Performance Gain vs Cost - Heavy Load .............................................................. 62
- xv -
LIST OF TABLES
Table 1 Ease Of Implementation Provided By Systems Under Evaluation ....................... 23
Table 2 Range Of Security Provided By Systems Under Evaluation ................................. 24
Table 3 Range Of Functionality Provided By Systems Under Evaluation ........................ 25
Table 4 Comparison Of Solutions ............................................................................................ 25
Table 5 Implementation Scenarios ........................................................................................... 38
Table 6 Specifications of Machines Used In Phase 01 ........................................................... 39
Table 7 Specifications of Machines Used In Phase 02 ........................................................... 40
Table 8 Enhanced Specifications of Machines Used In Phase 02 ........................................ 41
Table 9 Client Side Load ........................................................................................................... 42
Table 10 Phase 01 Test Bed ....................................................................................................... 43
Table 11 Phase 2 Test Bed ......................................................................................................... 49
Table 12 Server Infrastructure - Local Test Bed ..................................................................... 67
Table 13 Server Infrastructure - Cloud Test Bed ................................................................... 67
Table 14 Server Infrastructure - Cloud Test Bed (Extended) ............................................... 67
Table 15 Local Test - Experiment 01 ........................................................................................ 68
Table 16 Local Test - Experiment 02 ........................................................................................ 68
Table 17 Local Test - Experiment 03 ........................................................................................ 69
Table 18 Comparison Of Mean Response Time (Phase 01) ................................................. 69
Table 19 Cloud Test - Experiment 01 ...................................................................................... 70
Table 20 Cloud Test - Experiment 02 ...................................................................................... 70
Table 21 Cloud Test - Experiment 03 ...................................................................................... 71
Table 22 Cloud Test - Experiment 04 ...................................................................................... 71
Table 23 Cloud Test - Comparison Of Mean Response Time .............................................. 72
Table 24 Performance Gain - Light Load ................................................................................ 72
Table 25 Performance Gain - Medium Load .......................................................................... 73
Table 26 Performance Gain - Heavy Load.............................................................................. 73
- xvi -
Table 27 Performance Gain vs Cost (Light Load) ................................................................. 74
Table 28 Performance Gain vs Cost (Medium Load) ............................................................ 74
Table 29 Performance Gain vs Cost (Heavy Load) ............................................................... 74
- 1 -
Chapter 1
DATA SECURITY IN HIGH THROUGHPUT/CLOUD
BASED APPLICATION
1.1 Introduction
Information technology revolution brings a new way of managing things using
information and communication technologies. For the modern world, adoption of this
new approach results in the spill of data everywhere. The data relates to the every aspect
of the related systems and hence requires the availability of it as and when required. The
fast, optimized, cost effective and efficient solutions offered by information and
communication technologies adoption in real world scenarios also brings a serious
concern with them which related to the security of data.
In recent path, the security of confidential data becomes a major issue specially got
highlighted after the incident of secret documents initially breached by Chelsea Manning
[1] [2] and released to WikiLeaks. This incident brings a catastrophic revolution to United
States defense. Later, Edward Snowden [3] [4] brings a lot of hidden truths about the US
surveillance and spy program that is used for spying the whole world.
- 2 -
2014 was the most divesting year in term of leakage of confidential data to adversaries.
The study shows that data approximately equals to 740 million records was breached by
the malicious entities [5].
1.2 Motivation
1.2.1 Need For Data Protection
Organizations runs on the corporate data that is vital to their existence. This includes (but
not limited to) their financial documents, policies, future plans, marketing strategies,
research documents, employees related details, customers related details and so on. This
data acts just like the blood in the human body. It transfer among departments to
departments and provide vital information that are necessary to run the business
functions.
Keeping in view the preciousness of data, data owners always tends to protect their data
which is vital for their organization. As per rule of “equal and opposite reaction”,
snoopers, on the other hand try to get their hand on such corporate data maliciously to
gain benefits. This results in the continuous war between data owners and data snoopers.
- 3 -
Data owners take measures to protect the confidentiality, integrity and availability of
their data whereas the data snoopers try to compromise these three factors.
It is worth mentioning that around every system built today utilizes protection for data
by preventing snoopers breaking into it [6]. This strategy utilized different means at
different layers of the system to ensure protection at maximum. The crux of this strategy
is to make attacker as far as possible from the data in the system by building various
layers of obstacles in his path to his target. These obstacles includes (but not limited to)
access control mechanisms, network level security, operating system checks, security
policies, runtime / static application code analysis, trusted hardware, various intrusion
detection and prevention systems etc. As security and its breach is a cat-mouse game and
every side try to overcome the measures adopted by the other side hence after
implementation of above mentioned obstacles, incidents of breach of data still occurs.
In today’s hi-tech world, winning the trust of the end user / customer is the key to
winning the game. Incidents of data loss / breached impacts very badly on the effected
organization in two ways. First, they cause the targeted organization huge financial and
reputational loss and second, they brings the organization questionable to the
government against imposed obligations and regulations (as per their industry
requirements) [7].
Data snoopers can broadly be categorized in three major classes.
Hackers
- 4 -
Administrators / Insiders
Government agencies
All three have their own set of intentions and benefits associated with the organization’s
data and can try to access it using their own ways.
Figure 1 Three Dimensions Of Data Snoopers
1.2.2 Multi-Tier Architecture
Enterprise level software implementations results in manipulation of data in so much
quantity and gradually generates the requirements of further research in arena of Big
Data. These requirements later results in advancements in efficient and effective data
Org.
Data
Hackers
Insiders /
Admins
Govt.
- 5 -
storage, retrieval and processing techniques. These techniques were implemented to
fulfill the requirements of high throughput applications.
Multi-tier application architecture is in use in the industry for many years. This
architecture resolves many issues by defining visible boundaries between presentation
layer, business logic layer and the data layer. The adoption of multi-tier architecture
results in updation and modification to these layers independently without majorly
effecting the other layer. Client-server / multi-tier architecture becomes so common soon
after its introduction because of flexibility, ease of use, performance and control provided
by it. Majority of the enterprise grade application in the modern world use the same
strategy of deployment and operations [6].
1.2.3 Cloud Computing
In today’s world, this architecture also proves its success when implemented on the
cloud hosting model. It becomes a defacto standard for web application hosted on the
cloud platform. The effectiveness of this architecture makes it the first choice for
applications that requires high throughput and a large user base. Due to a list of benefits
like no setup / initial infrastructure costs, pay as you grow model, less administrative
overheads, shifting of technical expertise for technical management to cloud provider
and many others, more and more organizations are on the way to adoption of public
- 6 -
cloud computing. However, Cloud based application (specifically built on PaaS) also
brings another dimension to the unintended snoopers. i.e. the cloud server administrators
who may use their server administration privileges to look into the data stored on these
servers outside the control of the enterprises that owns the data.
1.2.4 Government Spying
The third stakeholder that counts against the confidentiality of data is the government
itself. The government regulations and U.S. security agencies are praying on both states
of data i.e. the data in move and data at rest under the companies and infrastructure in
their jurisdiction.
1.2.5 End User Privacy
Besides PaaS, SaaS brings the major revolution on how people utilized web. Most of the
daily web based affairs of persons are now moved to the SaaS offerings by the major
companies. Most of the people are using Gmail, outlook etc. for their email, google drive,
Microsoft OneDrive for their storage, google docs, google sheets, google presentation,
office 360, zoho office etc. for their office related documentation and so on. All these
offerings besides their fantastic list of features, effectiveness and ease of use, prevents the
corporations from utilizing them as their corporate data needs security which is usually
- 7 -
as such absent in these solutions. These offering provides the best security among the
dimension 1 of the data snoopers i.e. hackers but among remaining two, these companies
not only can review the data stored on their servers but in majority of cases, such big
companies like Microsoft, Google, Amazon etc. have close ties with US government
agencies that can access their servers as and when required. Persons usually compromise
over the free offerings of such services on their privacy.
Companies providing various B2C services also stores data of their customers / clients
on their servers. Once compromised, the leakage of the personal information related to
these clients / customers also impacts on the end users who have nothing to do with
either the company or the snoopers but their identity and other related information goes
into wrong hands.
1.3 Objectives of study
This research study is undertaken to identify the current trends in international research
to ensure confidentiality aspect of high throughput / cloud hosted multitier applications
keeping the minimum overhead and achieving the acceptable response time. The main
goal of this research study is to identify the key areas in high throughput data
applications that are critically vulnerable to loss of privacy of data. For this, the target is
to find out and test an implementable solution that provides the required level of security
- 8 -
for such data keeping the operational and functional overhead minimal using industry
standard encryption techniques. Performance measurement and analysis will be done at
both phases i.e. before and after implementation of encryption. This performance
evaluation will be studied and tested on the live cloud using simulated test cases for
verification of its applicability in real world scenarios.
To achieve the above mentioned goal, following are the objectives that were
accomplished in this study.
Studied ways to introduce data security [confidentiality only] in high throughput
/ cloud based web applications
Studied different techniques in place for this purpose
Selected query processing over encrypted data for performance analysis
Selected CryptDB for research study to implement and check the performance
overhead.
Implemented it in the use case designed specifically keeping in view the high
throughput requirement of national level medical application like OpenEMR.
Run the test at the local machine
Verified initial results from the local infrastructure to get probability of production
scale implementation
Run the test on Amazon’s EC2 cloud in a production like environment to verify
the actual performance fluctuation due to introduction of confidentiality using
carefully designed use cases testing the best to worst cases.
- 9 -
Analysis of results and write-up for presentation of the outcomes of this study.
1.4 Scope and Contribution
In this study, we observed the confidentiality aspect of data security in high throughput
/ cloud hosted applications.
The scope of this study is limited to the implementation of selected encryption model for
multi-tier application on a selected use cases of pre-selected enterprise grade medical
data management application and the performance analysis with the base line data which
is gathered from the native application to see the overhead caused by the induced
confidentiality. These use cases then will be used as Proof Of Concept for implementation
of encryption model with minimal implementation, operational and functional overhead.
The contributions made by this study are
Selection of a multi-tier web application model and addition of security layer for
encryption / decryption
Selection of the effective confidentiality model to be implemented at the security
layer
Implementation, assessment and performance analysis of the induced security
layer and its impacts on the response time in production like environment
- 10 -
Compilation of results and way forward
1.5 Limitations
During this study, following limitations / issues were faced by the researcher.
Relocation of supervisor to Lahore in the middle of study which shifted the
meetings from face to face to skype. On skype one cannot express the whole work
done after the previous meeting due to the limitation of virtual environment.
Lack of support by the authors of the CryptDB, a product of MIT due to their
research engagements (email correspondence) due to which a scaled down
customized version based on the original technique devised by the CryptDB
authors was made by the researcher to use as a proof of concept.
Version changes and fixation of issues in the selected national grade medical
record management application (OpenEMR) results in reworks at some levels.
1.6 Significance of the study
The outcome of this study presents a fully implementable solution tested for its minimal
impact on implementation, operational and functional aspects on an enterprise grade
cloud based environment which ensures confidentiality of data for high throughput /
- 11 -
cloud based applications. The same solution can equally be beneficial for private hosted
application using multi-tier architecture.
1.7 Thesis Outline
This section provides the outline of the rest of the document which is as under.
Chapter 2 contains discussion about the background, existing methodologies and
work done on the system under study.
Chapter 3 contains discussion about the proposal of a working model and
technical solution devised for this study.
Chapter 4 discusses the research methodology and design of experiment for the
research study.
Chapter 5 discusses tests performed on the proposed model and raw results
achieved during the experiments.
Chapter 6 contains the conclusion drawn by the results and the outcome achieved
by performing the whole research study.
- 12 -
Chapter 2
LITERATURE REVIEW
2.1 Introduction
Theft of data and breach of security is a commonly seen issue in web based applications.
Even high-tech companies becomes victims of such attacks like NVIDIA Corporation
which losses the user names and password from its server on January 06, 2015 [8]. Cost
effective offerings by cloud computing forces companies to outsource their IT
infrastructures and/or hosting to the cloud providers [9]. This cloud based hosting
provides the curious / malicious administrator a way to snoop on the data. With a slight
modification in a way, the same threat applied on the private cloud as well where an
insider can breach the confidentiality of data. US government agencies having a legal
cover can access the data on the servers resides on the US soils or under the supervision
of companies registered in the United States [3] [10].
An effective approach to minimize the impact of data breach is to encrypt it [11] which
transform the data into the garbage form and the snooper cannot get any benefit from it.
Besides solution to the problem, the encryption brings its own overheads including (but
not limited to) implementation, functional, operational and performance.
- 13 -
In the upcoming sections, we will look into the scenario of today’s enterprise grade high
throughput application general architecture and the ways we can introduce security
through encryption keeping the above mentioned overheads minimal.
2.2 Application Architecture
Multi-tier architecture is an architecture used for implementation of client-server
applications having separate layers for presentation, business logic and database.
Another synonym used for this architecture is n-tier architecture. The physical separation
of the layers results in greater efficiency, greater control and independent updation.
Using multi-tier approach, the designer of the system brings the flexibility and
independence to the developers by segregating the different technical aspects of the
system into different layer. This segregation results in modification and / or addition of
the existing and / or additional layers to cop up the new demands of the system without
interrupting the entire application.
The most popular and widely used internationally among n-tier architecture is the 3 tier
architecture. The sample working principle of 3-tier architecture is shown in the figure
below.
- 14 -
In this model, client interacts with the front end / application logic using software client
which displays the presentation layer of the application. This frontend layer interacts
with the business logic embedded in the application at App Server, which in turn, request
/ suggest the data to the data layer at DB Server as per requirement. Upon receiving the
data query, the DB Server prepares the requested data and sends it to the business logic
unit at App Server, which after carefully checking the fulfillment of requirement of client
request, send the data to the presentation layer which format and display the data in
visual formatted form at the client pc.
2.3 Ensuring Confidentiality Through Encryption
There are two ways of introduction of encryption in a system to ensure its confidentiality.
The traditional approach is simple and straightforward. To keep the data safe from
CLIENT
APP
DB
Figure 2 Multi-Tier Architecture
- 15 -
adversaries, encrypt the data using traditional encryption schemes. As discussed in the
above section, as per multi-tier architecture, data resides in the database server hence to
achieve the goad, one has to implement encryption on the database layer. Now days, all
medium to high end databases provides support of implementation of encryption which
makes implementation of initial encryption for data security very easy using built-in /
custom functions. However, in operational point of view, it is neither providing required
security nor it is very efficient in terms of functionality. In terms of security, the built-in
functions requires provision of key and data for the encryption. Anyone having access to
the query log (Admins of the DB server) may results in leakage of the encryption keys
and compromise of data confidentiality due to the key leakage. It is also not an efficient
technique. If we consider an example of a database table containing details about the
employees. When a user issues a query to see the name and designation of those
employees whose monthly salary is more than or equal to 10,000/-, in unencrypted
environment, the database make use of indexes to quickly scan the table for this query.
In a logarithmic number of operations as per the number of rows in the table, the database
server finds the required results and pass them on to the user. Now consider the
traditional encryption approach, as all values in the database table are encrypted so the
query has to scan the whole table, get the results, decrypt the result, and then filter them
as per the requirement of query and then send these results back to the DB server.
Another approach is to implement functionality of query processing over encrypted data
in the solution as this approach is far more practical, provides high degrees of security
- 16 -
and promises efficient functionality. Homomorphic and functional encryption
techniques provides the way to do so. In next sections, we will discuss the overview and
current trends in the literature regarding query processing over encrypted data with
focus on implementable solutions.
2.4 Homomorphic Encryption
The type of encryption that allows performing computations on cipher text is known as
homomorphic encryption. Upon decryption, the result matches with the results
performed in unencrypted data. This concept was introduced by MIT researchers in 1978
under the title of privacy homomorphism [12]. Many novel approaches emerged since
then to process the queries over the encrypted data to get efficiency in terms of time and
space requirement as compared to the original idea.
The significant work done later in this regard was to search using keywords associated
with encrypted data [13].
2.4.1 Fully Homomorphic Encryption [FHE]
A crypto system that allows arbitrary set of operations on the encrypted data without
exposing any information about the plaintext underneath is known as fully
- 17 -
homomorphic cryptosystem. The fully homomorphic encryption scheme is based on a-
symmetric (public key encryption) cryptography. This techniques guarantees semantic
security [14]. Other than the person having the private key, even the presence of public
key does not reveal anything about the plaintext data from the cipher text.
In this system, a key generation algorithm is used to setup a pair of keys i.e. public and
private. Then, public key is applied on the plaintext to get the cipher text which can only
be decrypted by using the private key of the same pair. Currently, many FHE schemes
were designed by the researchers which are capable of handling execution and
calculation of almost all operations on the encrypted data [15] [16] [17] [18] [19] [6]. The
FHE theoretically provides the semantic security which requires that the malicious user
having a cipher text and the public key used to encrypt that text should be unable to get
any information about the under laying plain text except its length.
It takes almost 30 years to come up with a workable FHE scheme which was presented
by Craig Gentry in 2009. In his research work, the implementation of “Full Homomorphic
Encryption” technique [15] was designed and later tested which allows execution of
various computations directly over the encrypted data. This technique is quiet marvelous
but still due it magnitude order cost of execution, it is not implemented in production
environment beside its ensured confidentiality. Since its introduction, many variants of
the original scheme were proposed which bring improvements in the original scheme but
- 18 -
it still in magnitude order slower which prevents its production level implementation in
the real world [6].
2.4.2 Partial Homomorphic Encryption (PHE)
The systems falls under partial homomorphic encryption (PHE) are the one that support
some operation(s) to be performed on the cipher text like addition, multiplication,
quadratic functions etc. Some examples of PHE are BGN crypto system [20], El Gamal
[21], Paillier [22] and Goldwasser-micali [23] etc. They provide the semantic security with
the exception that they support only specific function to be computed over encrypted
data. Their performance is better than the performance of existing FHE schemes [6].
2.4.3 Functional Encryption (FE)
Computation over encrypted data using Functional Encryption (FE) schemes results in
controlled leakage of the information related to the function applied on the cipher text.
This small information leakage drastically improves the performance of the system up to
many folds. In any case, the data encrypted is no revealed in any form. The concept was
initially given by Amit Sahai and Brent Water in their paper “Fuzzy Identity Based
Encryption” back in 2005 [24]. Keeping in view the performance gain achieved by the
controlled leakage of information required for the computation, many functional
- 19 -
schemes were designed to compute specific functions over the encrypted data efficiently
in terms of compute, time and space [25] [26] [27].
2.5 Current Frameworks
In this phase of study, those frameworks from the existing literature were studied that
are providing the system for implementation of query processing over encrypted data.
Due to limitation of time, the study scope was narrowed to the systems that provides the
implementable framework for real world implementation for performance analysis to
test the suitability in production like scenarios. Later, one among them which provides
the balance of ease of implementation versus efficiency and security provided is chosen
for test implementation. Below is the details of these frameworks.
2.5.1 CryptDB
A breakthrough research approach conceived by the researchers at MIT results in
creation of CryptDB [11], a system that provides an implementable way for
implementation of confidentiality over the databases transparently and allows execution
of queries over encrypted data with minimal overhead.
- 20 -
CryptDB is a SQL based implementation and hence can be installed with all major SQL
based DBMS. CryptDB design was based on the two fundamental ideas. One ability to
perform execution of SQL queries over encrypted data, two, adjustment in the encryption
of data as per requirement both in terms of security as well as functionality.
Figure 3 CryptDB Architecture [9]
2.5.2 Monami
Monami is another system developed for query processing over encrypted data. Monami
specifically targets the OLAP (online analytical processing) databases [9]. The key feature
of Monami is that it processes the query on the encrypted data in two modes. It splits the
complex queries and run most of the queries at the database end. But databases supports
only few among all queries to be processed on encrypted data. Monami converts such
unsupported complex queries in such a way that it split and execute the supported
portion of the query on database and run the remaining portion of the same query on the
client side.
- 21 -
Figure 4 Overall Architecture Of Monami [6]
2.5.3 Mylar
Mylar is also the framework allows query processing over encrypted data [28]. Mylar
encrypts the data when it is being written on the server and decrypts it in the end user
browser. It was still in the research phase [the first paper on Mylar was presented in
usenix’s nsdi14] however promises given by Mylar are fascinating. Mylar promises
keyword searches on the encrypted data using the Multi-Key Searchable Encryption [29].
The beauty of this web application development platform is it can search on encrypted
data which is encrypted by different keys.
- 22 -
Figure 5 Mylar Architecture [13]
2.5.4 Selection for test implementation
For this study, due to limitation of time, we have to select one among them for
implementation and performance analysis for enterprise grade implementation over the
cloud. All of the above solutions were gauged for the following factors
Ease Of Implementation: For this aspect, the solutions were gauged for how complex or
easy they are for implementation in real world scenario. Quantification was done from
the scale 1 to 10 under the following classification
o Minimal Efforts at implementation Time = 8-10 Marks
o Medium to Minimum effort at implementation time =5-8 Marks
o Minimum efforts at design time = 3-5 Marks
o Medium to Minimum efforts at design time = 0-3 Marks
- 23 -
Below are the results for ease of implementation.
Ease Of Implementation (10)
CryptDB (10)
Mylar (10)
Monami (10)
Data Tier Changes (-1 to -3)
(-02)
(-02)
(-02)
Application Tier Changes (-1 to -3)
(-00)
(-02)
(-02)
Client Tier Changes (-1 to -3)
(-00)
(-02)
(-01)
Other Tier Changes (-1)
(-01)
(-01)
(-01)
Results
07
03
04
Table 1 Ease Of Implementation Provided By Systems Under Evaluation
Range Of Security: For this aspect, the solutions were gauged for how much security
they provide and how many threat levels they nullify after implementation i.e. client side,
web server side and database server side. . Quantification was done from the scale 1 to
10 under the following classification.
o Support of maximum set of functions over encrypted data = 8-10 Marks
o Support of medium set of functions over encrypted data =5-8 Marks
o Support of minimum set of functions over encrypted data = 3-5 Marks
o No to minimum support for functions over encrypted data = 0-3 Marks
Below are the results for range of security.
Range Of Security (00)
CryptDB
Mylar
Monami
DB Level (+04)
(04)
(04)
(04)
Application Level (+02)
(00)
(02)
(00)
Client Level (+02)
(00)
(02)
(00)
Other (+01)
(01)
(01)
(01)
Multi Key Search Support (+01)
(01)
(01)
(01)
Result (10)
06
10
06
- 24 -
Table 2 Range Of Security Provided By Systems Under Evaluation
Range of functionality: for this aspect, the solution were gauged for how much
functionality they provide / support for the prime target of query processing over
encrypted data. . Quantification was done from the scale 1 to 10 under the following
classification.
o Support Confidentiality of columns that are not processed during query
processing. [Basic requirement for encrypted database] = 1 Mark
o Support equality check in queries to be processed over encrypted data.
[support select queries with equality checks, equality joins, DISTINCT,
GROUPBY and COUNT] = 1 Mark
o Support order processing in queries over encrypted data. [support filtering
data on ranges, sorting using ORDERBY, MINIMUM, MAXIMUM] = 2
Marks
o Support functions like SUM / addition [Support for equation in the query
to be processed over encrypted data] = 2 Marks
o Support JOIN on encrypted column [Support for relationship over
encrypted data] = 2 Marks
o Support word search in text column [Support queries processing containing
LIKE or EQUALITY on text column containing encrypted data] = 2 Marks
- 25 -
Below are the results for Range of Functionality
Range Of Functionality (00)
CryptDB (00)
Mylar (00)
Monami (00)
Non Processed Col. Conf. (+01)
(01)
(01)
(01)
Equality Check (+01)
(01)
(01)
(01)
Order Preserving (+02)
(02)
(00)
(02)
Sum (+02)
(02)
(00)
(02)
Join (+02)
(02)
(00)
(02)
Like / Keyword Search (+02)
(02)
(02)
(02)
Results
10
04
10
Table 3 Range Of Functionality Provided By Systems Under Evaluation
Below are the results for the above mentioned measurements.
Type
CryptDB (00)
Mylar (00)
Monami (00)
Ease (0.4)
06x0.4=2.4
03x0.4=1.2
04x0.4=1.6
Functionality (0.3)
10x0.3=3.0
04x0.3=1.2
10x0.3=3.0
Security (0.3)
06x0.3=2.4
10x0.3=4.0
06x0.3=2.4
Results
7.2
6.4
6.6
Table 4 Comparison Of Solutions
Based on the above mentioned results, we selected CryptDB as our test implementation
product details of which are in upcoming chapter.
- 26 -
Chapter 3
CRYPTDB
3.1 Introduction
In this chapter, technical solution for this study will be discussed. This technical solution
was made for limited scope proof of concept only to verify the large scale implementation
of the concepts gained through this study.
3.2 Security in Multi-Tier Architecture Using CryptDB
Multi-tier application architecture is in use in the industry for many years. This
architecture resolves many issues by defining visible boundaries between presentation
layer, business logic layer and the data layer. They are commonly used in enterprise
grade application hosted on private infrastructure as well as in the cloud. The same
model was adopted for design of experiment for this research study.
In the solution model, another server named CRYPTDB is placed between the application
server and the database server as shown in the figure.
- 27 -
This CryptDB server intercepts the traffic between web server and database server and
acts on the two way traffic to introduce security layer using the following basic rules.
Insert Query: This CryptDB will receive the insert queries from the application
server, rewrite them after encrypting data values using a preselected key and send
them to the database server to write encrypted data in the relevant tables.
Select Query : Upon retrieval of data, the CryptDB intercepts the select query,
dissect it, and perform operation on it based on the below mentioned rules
o If it is a simple select query, the CryptDB forwards the query to the
database, get the data returned, decrypt it with the pre-selected key and
forward the query to the database server. Upon receiving the encrypted
results, it decrypts the results and send the data to the webserver.
o If it contains a where clause and match contain straight equal to condition,
it encrypt the condition value with the key to rewrite them to demand data
CLIENT
APP
DB
CRYPTDB
Figure 6 Model For Design Of Experiment For This Research Study
- 28 -
from the server in the encrypted form, decrypt it and send the required data
to the application server for onward submission to client.
o If it contains a where clause and match contains complexity e.g. like, greater
than, less than etc., then the sentry demands the whole data from the table
for the specific column (s) used in the where clause, decrypt them, perform
a match function, note down the row id(s), retrieve the selected columns for
these row ids, decrypt them and send the data to the webserver.
Delete Query : Upon deletion of data, the CryptDB intercepts the delete query,
dissect it, and perform operation on it based on the below mentioned rules
o If it is a simple delete query, the CryptDB forwards the query to the
database for deletion of the data from the table.
o If it contains a where clause and match contain straight equal to condition,
it encrypt the condition value with the key to rewrite them to delete data
from the server.
o If it contains a where clause and match contains complexity e.g. like, greater
than, less than etc., then the sentry demands the whole data from the table
for the specific column (s) used in the where clause, decrypt them, perform
a match function, note down the row id(s) and then delete the rows form
the table matching these identified row id(s).
- 29 -
3.3 Query Execution and Data Confidentiality
The power of crypt lies in enabling DBMS to execute the queries on the encrypted data
in the same way as it executes them on unencrypted data. The beauty of this method is
that there is no need to bring changes in the existing applications. This is the major ease
by which we can transparently implement the confidentiality in application at the time
of their installation / configuration.
The CryptDB server stores the details about the secret MASTER KEY (MK), the meta data
about the schema of the application’s back end and the current security stance of the
columns in that schema. During installation / configuration, CryptDB intercepts the
creation of schema and replace the tables and fields names to something that human can
easily read. This results in security of the schema as the person having database server
and app server access does not know the mapping of the fields and tables. It also add
some tables / columns in the existing tables to maintain the encryption management.
Some user defined functions (UDF) are also added to enable the DBMS compute over
encrypted data.
The system flow of working of CryptDB is as under.
- 30 -
3.3.1 CryptDB Scope For Confidentiality
CryptDB only ensures confidentiality of the data on database server. Its scope does not
include integrity or availability of data. It also targets the protection of only database
server from compromise. In term of confidentiality, the CryptDB provides fool proof
security against the database administrator or the snooper having the full control over
the database server. Hence it ensures the safety of data in times of DB server compromise.
Figure 7 System Flow Of CryptDB
App Server Issues
Query
CryptDB Server
Intercepts Query
CryptDB Change
table / column
name in query
CryptDB encrypts
the constants in
the query
CryptDB Checks
if change in onion
is required
CryptDB Issues
update query for
onion adjustment
CryptDB Forward
the query to DB
Server
DB Server
executes the
query
DB Server
Returns the
encrypted results
CryptDB decrypts
the encrypted
results
CryptDB rewrites
the new
encrypted query
CryptDB Forward
the results to App
Server
YES
NO
- 31 -
The CryptDB provides security guarantee of data against threats like compromise of
DBMS software, when data snooper succeeded to gain root level to the database server
machine, from the database administrator or any other entity trying to access the data
through database management system and even if someone has access to physical RAM
of the database server. It also provides you data safety in case of hosting of database
server on the third party cloud infrastructure. The guarantees provided by CryptDB
includes data contents, names of tables and names of columns.
CryptDB encrypts the data in two ways. The Data Owner can classify his / her data
columns in the table in two categories, i.e. SENSITIVE and BEST-EFFORT. For columns
declared SENSITIVE, CryptDB ensures semantic security. Due to this, functionality of
query processing over encrypted data becomes limited on these column but the security
becomes strongest. For column declared BEST-EFFORT, CryptDB ensure the MAXIMUM
security keeping in view the queries operations on that columns. Here the functionality
of query processing over encrypted data for these column becomes fully enabled but it
might decreases the semantic security and leaks some kind of information about the
contents of these columns as per the query requirement.
3.3.2 CryptDB Encryption Schemes
- 32 -
Following are the encryption scheme used by CryptDB for ensuring confidentiality of
data.
3.3.2.1 RND Scheme
RND (Random) encryption scheme implements block ciphers (AES) in CBC mode [30]
with random IV except in case when the targeted column is in integer format. For integer
column it uses Blow Fish algorithm to save the space as AES uses 128 bit block whereas
Blow Fish uses 64 bit block. Columns secured with RND provides the best security for
data contents but does not allow any computation over encrypted data.
3.3.2.2 DET Scheme
DET (Deterministic) encryption scheme implements block ciphers (AES) using a variant
of CMC mode [30] with zero IV except in case when the targeted column is in integer
format. For integer column it uses Blow Fish algorithm to save the space as AES uses 128
bit block whereas Blow Fish uses 64 bit block. The requirement is to get the same cipher
text for same data in the column. This property enables database server to compute the
equality, perform joins and calculating GROUP BY, COUNT and DISTINCT etc.
3.3.2.3 OPE Scheme
- 33 -
OPE (Order Preserving Encryption) scheme [31] preserves the order relationship of the
data items while encrypting them. The algorithms of OPE states that if x < y, then OPEk(x)
< OPEk(y) where k is any secret key to encrypt the data. Data columns encrypted using
OPE enables the functionality of SORT, MIN, MAX, ORDER BY etc. to be executed on
encrypted data.
3.3.2.4 Paillier Cryptosystem
Paillier cryptosystem [22] is a homomorphic encryption scheme that allows the
computation over encrypted data. Implementation of this scheme results in enabling the
functionality of calculating averages (AVG) and queries requires computation like salary
= salary + 100 etc.
3.3.2.5 SEARCH
SEARCH scheme is used for word search functionality in the CryptDB [13]. Columns
encrypted with SEARCH results in enabling the functionality of executing queries
containing LIKE operator against such queries. However this functionality is limited to
the full word. It cannot compute like on partial words. The SEARCH is nearly as secure
as RND level.
- 34 -
Chapter 4
RESEARCH METHODOLOGY
4.1 Introduction
In this chapter, technical solution for this study will be discussed. This technical solution
was made for limited scope proof of concept only to verify the large scale implementation
of the concepts gained through this study. Details about application scenario, phases of
experiments, details about experiments performed, workload generation and evaluation
criteria are discussed in upcoming sections.
4.2 Sample Application Scenario
One of the key examples of high throughput application is the national level / enterprise
level medical health management system. Many countries have already such
infrastructure implemented and operational in which every citizen is registered and
whenever any medical treatment is given to him / her, it was recorded accordingly to
have a complete medical history for later ailment treatment keeping in view the past
history. On the other hand, other stakeholders of the medical system like hospitals, labs,
pharmacists and insurance companies can also use this system in other dimensions to
- 35 -
fulfill the medical services and financial services provisioning to the patient or other
stakeholders. The upfront benefits of such implementations are (not limited to) as follow.
Patients can take care of their patient history and have all the record in place in
electronic format and radially available when required by the doctors
Doctors have complete picture of past medical life of the patients in front of them
to make them able to take better decisions regarding selection of treatments
keeping in view the current and past history of diseases and allergies etc.
Drug control can be implemented as the pharmacists issue the drugs only to the
patients after receiving online prescription by the doctors and duly issue the
medicines accordingly.
Lab test management can be easily done as test requests and later test results were
readily available to the patient and doctors as and when required.
Insurance companies can provide the medical insurance services to their
subscribers in an efficient and transparent way where after immediate verification
about validity of the patient’s medical insurance coverage, the hospital start
providing the medical services to the patient and forward the bills to insurance
company’s account for later clearance.
Government can keep track of ailment and diseases propagation patterns and act
accordingly to device and implement the national health policies to better serve
their citizens.
- 36 -
National / local level disease identification, spread / control graphs based on the
cases reported for particular disease
Where researchers works on the medical history of the patients of specific disease
and test effectiveness of different medicines against that disease to study the
success ratio
4.3 Test Scenario
High throughput applications like medical and health record management are widely
used in the developed countries for management of health records of the citizens of these
countries. This approach is expected to be followed by the nations that have rolled out
information and communication technology infrastructure throughout in their vicinity.
Medical records are the mirror of personal attributes and conditions of any patient and
needs a confidential handling and careful access control. Beside this privacy control, on
the other hand, medical records are also used for various research and development in
the field of health services provisioning at national scale.
Encryption can used in such applications initially for achieving the goal of confidentiality
of data. However, it can later be used for controlling the access to the data by sharing of
encryption keys with only the entities that have requirement of access to such data.
- 37 -
Introduction of encryption comes with additional processing overhead both at the time
of writing as well as reading of data and this overhead prevents high through put
application like medical and healthcare record management systems of enterprise /
national level to not to implement this feature to retain the acceptable performance of
their system.
4.3.1 Technologies Used During Experiments
Keeping in view the above mentioned facts, Open EMR [32] was selected as test
implementation application. OpenEMR is an electronic health record management
system with a complete coverage of dimension related to an integrated Health
Management System. The technologies behind the test scenario setup are Open source
i.e. LAMP (Linux, Apache, MySQL, PHP). All the server are on Ubuntu Server operating
system. PHP is the language used for coding the test scenarios. Back end data is stored in
MySQL database.
4.3.2 Implementation Scenarios
Four experimental implementation scenarios were selected to test and compare the
results which are as under.
- 38 -
Scenario Name
Details
Remarks
Experiment-01
One Virtual Machine
Application server and database
server on same machine with no
CryptDB
Experiment-02
Two Virtual Machines,
One for Web Server &
One For DB Server
Application server and database
server on separate machines with
no CryptDB
Experiment-03
Two Virtual Machines,
One for Web Server &
One For DB Server
Application server and database
server on separate machines with
CryptDB installed on DB server
Experiment-04
Three Virtual Machines,
One each for Web Server,
DB Server and CryptDB
Application server, database server
and CryptDB on separate machines
Table 5 Implementation Scenarios
4.4 Test Execution
Experiments were planned and executed in two phases details of which are as under.
4.4.1 Experiment On Local Test Bed
In first phase, the model mentioned in the figure 06 was implemented on local machines
to test and validate its functionality. Three servers were prepared for testing of initial
three experiments. Due to limitation of available resources on local infrastructure, test
were not performed on local machines using experiment 04 in this phase.
- 39 -
The webserver in this test phase was implemented using apache server [33]. The database
used for this phase is MySQL [34] which is the widely used open source database
management system. The CryptDB was implemented as a POC using MySQL proxy and
the coding was done using LUA programming language. OpenEMR [32], an open source
enterprise grade electronic medical record management system was selected as a sample
test implementation to implement data security. Fabricated records of 1,00,000 (One Lac)
patients were generated and inserted into the databases of all three experiments. For
virtual load generation to simulate users, we used JMeter [35] which is a very popular
open source tool for performance measurement.
Our pilot testing infrastructure in phase 1 was built on virtual machines on a Core i5
system with 4 GB RAM and 128GB SSD. 03 machines were configured for the test to host
web server, database server and user load generator with the following specifications.
Sr. #
Machine
vCPU
RAM
Hard Disk
1
Web Server
1
1GB
8GB SSD
2
DB Server
1
1GB
8GB SSD
3
User Load Generator
1
1GB
8GB SSD
Table 6 Specifications of Machines Used In Phase 01
4.4.2 Experiments On Amazon Cloud
In second phase, the model mentioned in the figure 06 was implemented on Amazon
cloud (EC2). During the test, eight servers were prepared for testing of these scenario
- 40 -
keeping the machines with different configurations for inter comparison between the
server infrastructures.
The webserver in this test phase was implemented using apache server [33]. The database
used for this phase is MySQL [34] which is the widely used open source database
management system. The CryptDB was implemented as a POC using MySQL proxy and
the coding was done using LUA programming language. OpenEMR [32], an open source
enterprise grade electronic medical record management system was selected as a sample
test implementation to implement data security. Fabricated records of 1,00,000 (One Lac)
patients were generated and inserted into the databases of all three experiments. For
virtual load generation to simulate users, we used JMeter [35] which is a very popular
open source tool for performance measurement.
Our pilot testing infrastructure in phase 2 was initially built on virtual machines with the
following specifications.
Sr. #
Machine
vCPU
RAM
Hard Disk
1
Web Server
2
4GB
8GB SSD
2
DB Server
2
4GB
8GB SSD
3
CryptDB
2
4GB
8GB SSD
4
DB Server For CryptDB
2
4GB
8GB SSD
5
User Load Simulator
2
4GB
8GB SSD
Table 7 Specifications of Machines Used In Phase 02
- 41 -
Later, following machine with enhanced specifications were added for further tests.
Sr. #
Machine
vCPU
RAM
Hard Disk
6
Web Server
2
7.5GB
32GB SSD
7
CryptDB
2
7.5GB
32GB SSD
8
DB Server For CryptDB
2
7.5GB
32GB SSD
Table 8 Enhanced Specifications of Machines Used In Phase 02
4.5 Test Use Cases
Use cases were designed keeping in view the capabilities of CryptDB in accordance with
the requirements and design of OpenEMR, our selected high through put application.
These use cases were designed for and executed in JMeter which is very popular open
source performance testing and benchmarking tool designed by Apache.
4.5.1 Test Bed Environment
To create an enterprise grade scenario, schema related to test cases from OpenEMR,
schema of tables relevant to test scenarios were ported to the database. A data of 1,00,000
record was generated and entered into the database tables to mimic the production
scenarios. During insertion of these records through CryptDB in encrypted format, it was
observed that average insertion time for these records at the highest secure onion level of
CryptDB is 10 milliseconds which is very acceptable.
- 42 -
4.5.2 Client Side Load
Keeping in view the security capabilities of CryptDB and the processing required for
them, a carefully selected set of scenarios was created for load testing against the servers.
These scenarios were classified as Light Load, Medium Load and Heavy Load. Details of
these loads are as under.
Sr. #
Load Name
Compute
Intensive
Memory
Intensive
Data Intensive
01
Light Load
Low
Low
Low
02
Medium Load
Medium
Medium
Medium
03
Heavy Load
High
High
High
Table 9 Client Side Load
These use cases were programmed in PHP language and hosted on webserver in the test
bed. A synthetic workload based on these custom loads was generated against the
experiment scenarios and response time was recorded for further analysis.
- 43 -
Chapter 5
EXPERIMENTS & RESULTS
5.1 Introduction
In this chapter, results obtained from the experiments executed during the study will be
discussed. As the tests were performed in two phases, we will present the results in the
same order.
5.2 Experiments On Local Test Bed
In phase 01 which was performed on the local machines to test the validity, three
machines were prepared to test the first three experiments which are as under.
Scenario Name
Details
Remarks
Experiment-01
One Virtual Machine
Application server and database
server on same machine with no
CryptDB
Experiment-02
Two Virtual Machines, One
for Web Server & One For
DB Server
Application server and database
server on separate machines with
no CryptDB
Experiment-03
Two Virtual Machines, One
for Web Server & One For
DB Server
Application server and database
server on separate machines with
CryptDB installed on DB server
Table 10 Phase 01 Test Bed
- 44 -
All three loads were executed against the test bed experiments no 01, 02 and 03 initially
using browser to test and validate the proper implementation and configuration.
After this validation, the same loads were designed using JMeter. After creation of tests
in JMeter, they were executed to simulate the load of user from 10 concurrent users to 100
concurrent users in the increasing order of 10 users per iteration. Each iteration was tested
05 times to remove any outlier and the average result of these 5 executions were counted
for analysis.
Below are the results obtained from the tests of phase 01.
- 45 -
5.2.1 Experiment-01 On Local Test Bed
In this experiment performed on local test bed, Light Load was generated against the
Experiment 01. Response times were recorded against 05 execution of the test against the
badges of users ranging from 10-100 with increment of 10 users per badge. Mean for
response time were recorded against each badge.
Infrastructure contains a single machine having specification as mentioned in table 6 and
webserver as well as database server are implemented on the same machine. There is no
encryption involved so far. The three loads were executed successfully against the
implementation as shown in the figure 8.
Figure 8 Experiment 01 On Local Test Bed
- 46 -
5.2.2 Experiment-02 On Local Test Bed
In this experiment performed on local test bed, light, medium and heavy Loads were
generated against the Experiment 02. Response times were recorded against 05 execution
of the test against the badges of users ranging from 10-100 with increment of 10 users per
badge. Mean for response time were recorded against each badge.
Infrastructure contains separate machines for web server and database server having
specification as mentioned in table 6. There is no encryption involved so far. The three
loads were executed successfully against the implementation as shown in the figure 9.
Figure 9 Experiment 02 On Local Test Bed
- 47 -
5.2.3 Experiment-03 On Local Test Bed
In this experiment performed on local test bed, light, medium and heavy Loads were
generated against the Experiment 03. Response times were recorded against 05 execution
of the test against the badges of users ranging from 10-100 with increment of 10 users per
badge. Mean for response time were recorded against each badge.
Infrastructure contains separate machines for web server and database server having
specification as mentioned in table 6. For encryption, CryptDB was installed on the same
machine which is used as database server. The three loads were executed successfully
against the implementation as shown in the figure 10.
Figure 10 Experiment 03 On Local Test Bed
- 48 -
5.2.4 Comparison Of Means Of Experiments On Local Test Bed
As in real world implementation, the original load on the website is the complex
combination of the synthetic loads defined for this study, a mean of all three loads, i.e.
light, medium and heavy is presented in the figure 11.
Keeping the threshold of 5000 MS for a response time, it was observed that after
introduction of encryption, the local infrastructure can handle up to 50 concurrent users.
Keeping in view the limitation of local test scenarios, this figure of 50 concurrent user
support gives us a very positive go ahead to further research and analyze this in the
environment mimicking the real production environment. This leads us to enter in the
phase 02 of study which contains performing experiments on the cloud based test bed.
Figure 11 Comparison of MEAN of experiments performed on Local Test bed
- 49 -
5.3 Experiments On Amazon Cloud
In phase 02 which was performed on the Amazon’s EC2 Cloud, to test the validity, nine
machines were prepared to test the first three scenarios of the experiments which are as
Scenario Name
Details
Remarks
Experiment-01
One Virtual Machine
Application server and database
server on same machine with no
CryptDB
Experiment-02
Two Virtual Machines, One
for Web Server & One For
DB Server
Application server and database
server on separate machines with no
CryptDB
Experiment-03
Two Virtual Machines, One
for Web Server & One For
DB Server
Application server and database
server on separate machines with
CryptDB installed on DB server
Experiment-04
Three Virtual Machines,
One each for Web Server,
DB Server and CryptDB
Application server, database server
and CryptDB on separate machines
Table 11 Phase 2 Test Bed
All three loads were executed against the test bed infrastructure experiment no 01, 02, 03
and 04 initially using browser to test and validate the proper implementation and
configuration. After this validation, the same loads were designed using JMeter. After
creation of tests in JMeter, they were executed to simulate the load of user from 10
concurrent users to 200 concurrent users in the increasing order of 10 users per badge.
Below are the results obtained from the tests of phase 02.
- 50 -
5.3.1 Experiment-01 On Amazon Cloud
In this experiment performed on cloud test bed, Light Load was generated against the
Experiment 01. Response times were recorded against 05 execution of the test against the
badges of users ranging from 10-200 with increment of 10 users per badge. Mean for
response time were recorded against each badge.
Infrastructure contains a single machine having specification as mentioned in table 7 and
webserver as well as database server are implemented on the same machine. There is no
encryption involved so far. The three loads were executed successfully against the
implementation as shown in the figure 12.
Figure 12 Experiment 01 On Cloud Test Bed
- 51 -
5.3.2 Experiment-02 On Amazon Cloud
In this experiment performed on cloud test bed, light, medium and heavy Loads were
generated against the Experiment 02. Response times were recorded against 05 execution
of the test against the badges of users ranging from 10-200 with increment of 10 users per
badge. Mean for response time were recorded against each badge.
Infrastructure contains separate machines for web server and database server having
specification as mentioned in table 7. There is no encryption involved so far. The three
loads were executed successfully against the implementation as shown in the figure 13.
Figure 13 Experiment 02 On Cloud Test Bed
- 52 -
5.3.3 Experiment-03 On Amazon Cloud
In this experiment performed on cloud test bed, light, medium and heavy Loads were
generated against the Experiment 03. Response times were recorded against 05 execution
of the test against the badges of users ranging from 10-200 with increment of 10 users per
badge. Mean for response time were recorded against each badge.
Infrastructure contains separate machines for web server and database server having
specification as mentioned in table 7. For encryption, CryptDB was installed on the same
machine which is used as database server. The three loads were executed successfully
against the implementation as shown in the figure 14.
Figure 14 Experiment 03 On Cloud Test Bed
- 53 -
5.3.4 Experiment-04 On Amazon Cloud
In this experiment performed on cloud test bed, light, medium and heavy Loads were
generated against the Experiment 04. Response times were recorded against 05 execution
of the test against the badges of users ranging from 10-200 with increment of 10 users per
badge. Mean for response time were recorded against each badge.
Infrastructure contains separate machines for web server, database server and CryptDB
server having specification as mentioned in table 7. For encryption, CryptDB was
installed on the separate machine which is giving more security due to separate
management and processing of CryptDB requirements. The three loads were executed
successfully against the implementation as shown in the figure 15.
Figure 15 Experiment 04 On Cloud Test Bed
- 54 -
5.3.4 Comparison Of Means Of Experiments On Amazon Cloud
As in real world implementation, the original load on the website is the complex
combination of the synthetic loads defined for this study, a mean of all three loads, i.e.
light, medium and heavy is presented in the figure 16.
Keeping the threshold of 5000 MS for a response time, it was observed that after
introduction of encryption, even the load of 200 concurrent users exerting the load on the
webserver is still manageable by the selected configuration in experiment 04 as shown in
figure 16.
Figure 16 Comparison Of Mean Of Experiments On Cloud Test Bed
- 55 -
5.3.5 Comparison Of Means Of Extended Experiments On Amazon Cloud
As shown in figure 16, the 200 concurrent user load can still be under the threshold set
for this study i.e. response time under 5000 milliseconds so the user load was extended
up to 500 users with 50 user addition per badge.
The threshold limit reached at approx. 250 concurrent users. Hence it is observed that
even the introduction of encryption, the high throughput application still enjoys a
performance that can be implementable in production environment of high throughput
areas.
Figure 17 Comparison Of Mean Throughput - Extended Test Cases
- 56 -
5.4 Upgradation Of Cloud Infrastructure
To mimic the real world situations based on resource provision when required as in cloud
computing, the cloud infrastructure was upgraded for CryptDB implementation to check
the performance gain against each upgrade done to the encryption enabled environment.
Below is the details of upgraded servers used for the said experiments.
Sr. #
Machine
vCPU
RAM
Hard Disk
1
Web Server
2
7.5GB
32GB SSD
2
DB Server
2
7.5GB
32GB SSD
3
CryptDB
2
7.5GB
32GB SSD
After upgrade, the experiments were run on the upgraded machines and performance
gain was recorded which is discussed in upcoming sections.
5.4.1 Upgraded Test Infrastructure
All three loads were executed against the test bed infrastructure having encryption
enabled, initially using browser to test and validate the proper implementation and
configuration. After this validation, the same loads were designed using JMeter. After
creation of tests in JMeter, they were executed to simulate the load of user from 0
concurrent users to 500 concurrent users in the increasing order of 50 users per badge.
Below are the results obtained from the tests of upgraded phase 02.
- 57 -
5.4.2 Performance Gain Using Light Load
When light load was executed against the upgraded experiments involving encryption,
performance gains with respect to the experiment-04 configuration were recorded which
are shown in the figure 18.
Figure 18 Performance Gain After Upgrade - Light Load
It was observed that for normal user load for high throughput applications (50-150
concurrent users) average performance gain was approx. 35 % with respect to the
Experiment-04 configuration when database server and CryptDB servers were upgraded.
- 58 -
The overall highest performance gain is about 22% for Experiment-04 configuration
which increases the cost to 100%. Further details are shown in figure 19.
Figure 19 Performance Gain vs Cost - Light Load
5.4.2 Performance Gain Using Medium Load
When medium load was executed against the upgraded experiments involving
encryption, performance gains with respect to the experiment-04 configuration were
recorded which are shown in the figure 20.
- 59 -
Figure 20 Performance Gain After Upgrade - Medium Load
It was observed that for normal user load for high throughput applications (50-150
concurrent users) average performance gain was approx. 40 % with respect to the
Experiment-04 configuration when database server and CryptDB servers were upgraded.
The overall highest performance gain is about 22% for Experiment-04 configuration
which increases the cost to 100%. Further details are shown in figure 21.
- 60 -
Figure 21 Performance Gain vs Cost - Medium Load
5.4.3 Performance Gain Using Heavy Load
When heavy load was executed against the upgraded experiments involving encryption,
performance gains with respect to the experiment-04 configuration were recorded which
are shown in the figure 22.
- 61 -
Figure 22 Performance Gain After Upgrade - Heavy Load
It was observed that for normal user load for high throughput applications (50-150
concurrent users) average performance gain was approx. 04 % with respect to the
Experiment-04 configuration when database server and CryptDB servers were upgraded.
Keep in view that the heavy load is compute as well as network intensive. The overall
highest performance gain is about 8.5 % for Experiment-02 configuration which increases
the cost to 33%. Further details are shown in figure 23.
- 62 -
Figure 23 Performance Gain vs Cost - Heavy Load
- 63 -
Chapter 6
CONCLUSION & RECOMMENDATIONS
6.1 Introduction
In this chapter, outcomes and benefits achieved from execution of this study will be
discussed. It also discussed the future directions and technical recommendations related
to this study.
6.2 Conclusion
Keeping in view the rising trend of adoption of cloud computing, the security of data
becomes as risk because outsourcing makes data targeted by three kinds of snoopers, i.e.
hackers, server administrators and government surveillance. Hence data security
becomes vital for the businesses due to preciousness of data as well as the government
regulations imposed in many countries regarding protection and privacy of data.
To ensure the confidentiality of data, encryption is the all winner technique used in
various ways since hundreds of years. However its introduction also brings its overhead
which is usually unacceptable in high throughput applications. The major issue with the
- 64 -
encryption is that it usually becomes data unreadable until decrypted and business
application relies heavily on processing of their data.
During this study, latest trends in query processing over encrypted data were studied for
their selection and verification in an enterprise grade cloud base production
environment. Among available research solutions, CryptDB (A research product from
MIT, US) was selected and implemented in a multi-tier environment on local machines
simulating use cases from a national scale enterprise grade electronic health management
solution named OpenEMR. Three use cases simulating the light, medium and heavy user
load were designed keeping in view the working of OpenEMR as well as capabilities and
working of CryptDB. Synthesized User Load generated using these test cases with the
help of JMeter, an open source performance analysis tools.
After verification of the results on the local test bed, a production mimicking environment
was designed on Amazon’s Cloud and same experiments were run on this enterprise
grade environment and results were recorded for further analysis. Later the servers in
cloud based environment were upgraded and experiments were re run to verify cost vs
performance gain.
It was observed that CryptDB successfully provides the data confidentiality on the
Database server and makes all three kind of snoopers unable to get their hands on the
organization’s data on the database server keeping the environment and performance
- 65 -
still under satisfactory limits in terms of the response time to the end users. CryptDB
guarantees the complete confidentiality of data in case of Database server compromise
either by hackers having even root level access or the curious database server
administrator at cloud providing agency / government forcing cloud infrastructure
provider to release the data to them.
The beauty of CryptDB is its minimal implementation overhead as it transparently
activates itself between the application and the database server and requires minimum
(almost equal to no) changes in the application or database design. The part of changes
required is automatically handled and hence makes its implementation much cost as well
as security wise effective and efficient.
Not only in cloud environment, the same solution can equally be beneficial for private
hosting scenarios. In that case, it will ensure data safety from insider attacks
(organization’s database administrator will replace the cloud provider database
administrator) as well as from hackers and government spying.
6.3 Recommendations
CryptDB is the tested solution (based on this study) on enterprise grade multi-tier
applications involving databases to ensure confidentiality of data in the database servers
- 66 -
along with keeping the performance still acceptable. It can equally be beneficial for cloud
based setups (where servers are hosted on cloud infrastructure) and private hosted
setups. It will even give more performance as compared to results of this study in case of
private hosted setups as network latency in private hosted setups is very less than in case
of cloud based setups.
CryptDB can be used by businesses involving financial institution, medical related
organizations, general purpose businesses, government agencies and defense sector
where there is requirement of encryption of data to ensure confidentiality without a
noticeable degradation in performance.
6.4 Future Work
Implementablility as shown in this study is very helpful to proceed further in the arena
of data security by devising and testing of solution that can also ensures the integrity of
the data. Further, there is a gap for working in modifying the same concept as used in
CryptDB to implement in NoSQL based solution and encrypted query processing over
Big Data.
- 67 -
APPENDICES
A - SERVER INFRASTRUCTURE LOCAL TEST BED
Sr. #
Machine
vCPU
RAM
Hard Disk
1
Web Server
1
1GB
8GB SSD
2
DB Server
1
1GB
8GB SSD
3
User Load Simulator
1
1GB
8GB SSD
Table 12 Server Infrastructure - Local Test Bed
B - SERVER INFRASTRUCTURE CLOUD TEST BED
Sr. #
Machine
vCPU
RAM
Hard Disk
1
Web Server
2
4GB
8GB SSD
2
DB Server
2
4GB
8GB SSD
3
CryptDB
2
4GB
8GB SSD
Table 13 Server Infrastructure - Cloud Test Bed
C - SERVER INFRASTRUCTURE CLOUD TEST BED
(EXTENDED)
Sr. #
Machine
vCPU
RAM
Hard Disk
1
Web Server
2
7.5GB
32GB SSD
2
DB Server
2
7.5GB
32GB SSD
3
CryptDB
2
7.5GB
32GB SSD
Table 14 Server Infrastructure - Cloud Test Bed (Extended)
- 68 -
D PHASE 01 EXPERIMENTS RESULTS
EXPERIMENT-01 ON LOCAL TEST BED
NO. OF USERS
FAST LOAD
MEDIUM LOAD
SLOW LOAD
0
0
0
0
10
7
8
16
20
8
8
20
30
9
10
26
40
10
11
50
50
15
16
113
60
36
27
160
70
40
41
196
80
80
80
231
90
112
80
273
100
154
139
300
Table 15 Local Test - Experiment 01
EXPERIMENT-02 ON LOCAL TEST BED
NO. OF USERS
FAST LOAD
MEDIUM LOAD
SLOW LOAD
0
0
0
0
10
13
9
18
20
16
10
24
30
16
12
35
40
20
14
75
50
36
22
116
60
57
33
163
70
53
65
230
80
143
99
245
90
158
122
263
100
126
138
309
Table 16 Local Test - Experiment 02
- 69 -
EXPERIMENT-03 ON LOCAL TEST BED
NO. OF USERS
FAST LOAD
MEDIUM LOAD
SLOW LOAD
0
0
0
0
10
32
51
4243
20
107
287
7204
30
253
558
9929
40
464
715
11529
50
639
867
14055
60
699
987
13856
70
765
1081
15788
80
906
1196
17661
90
1067
1252
21826
100
1105
1294
21016
Table 17 Local Test - Experiment 03
PHASE 01 -COMPARISON OF MEAN RESPONSE TIME OF SELECTED TEST CASES
NO. OF USERS
EXPERIMENT-01
EXPERIMENT-02
EXPERIMENT-03
EXPERIMENT-04
0
0
0
0
NA
10
10
13
1442
NA
20
12
17
2533
NA
30
15
21
3580
NA
40
24
36
4236
NA
50
48
58
5187
NA
60
74
84
5181
NA
70
92
116
5878
NA
80
130
162
6588
NA
90
155
181
8048
NA
100
198
191
7805
NA
Table 18 Comparison Of Mean Response Time (Phase 01)
- 70 -
E PHASE 02 (CLOUD BASED) EXPERIMENTS RESULTS
EXPERIMENT 01 ON CLOUD TEST BED
NO. OF
USERS
LIGHT
LOAD
MEDIUM
LOAD
HEADY
LOAD
NO. OF
USERS
LIGHT
LOAD
MEDIUM
LOAD
HEAVY
LOAD
0
0
0
0
10
2
1
3
110
14
13
52
20
1
1
3
120
18
14
60
30
1
1
4
130
23
14
70
40
2
1
8
140
22
11
71
50
2
1
13
150
26
22
78
60
2
1
19
160
31
29
82
70
4
2
27
170
22
29
86
80
6
4
34
180
27
17
96
90
8
6
37
190
30
35
100
100
18
8
44
200
31
40
106
Table 19 Cloud Test - Experiment 01
EXPERIMENT 02 ON CLOUD TEST BED
NO. OF
USERS
FAST
LOAD
MEDIUM
LOAD
SLOW
LOAD
NO. OF
USERS
FAST
LOAD
MEDIUM
LOAD
SLOW
LOAD
0
0
0
0
10
2
2
4
110
13
40
44
20
2
2
4
120
20
26
49
30
2
2
5
130
26
28
55
40
2
2
8
140
27
31
64
50
2
2
13
150
31
27
64
60
2
2
38
160
23
28
71
70
4
4
28
170
34
28
76
80
16
6
30
180
24
32
81
90
12
9
37
190
36
35
85
100
4
10
41
200
32
37
92
Table 20 Cloud Test - Experiment 02
- 71 -
EXPERIMENT 03 ON CLOUD TEST BED
NO. OF
USERS
FAST
LOAD
MEDIUM
LOAD
SLOW
LOAD
NO. OF
USERS
FAST
LOAD
MEDIUM
LOAD
SLOW
LOAD
0
0
0
0
10
12
15
336
110
392
531
5369
20
53
54
837
120
448
517
5894
30
83
88
1380
130
483
619
6503
40
115
133
1863
140
544
689
7100
50
139
162
2305
150
566
713
7644
60
192
208
2784
160
619
758
8108
70
236
270
3346
170
630
771
8658
80
324
345
3887
180
670
762
9338
90
331
467
4402
190
712
787
9796
100
399
480
4863
200
753
839
10417
Table 21 Cloud Test - Experiment 03
EXPERIMENT 04 ON CLOUD TEST BED
NO. OF
USERS
FAST
LOAD
MEDIUM
LOAD
SLOW
LOAD
NO. OF
USERS
FAST
LOAD
MEDIUM
LOAD
SLOW
LOAD
0
0
0
0
10
15
16
336
110
347
490
4005
20
42
54
837
120
399
498
5665
30
73
87
1380
130
487
560
7489
40
101
122
1863
140
561
559
8123
50
137
166
2305
150
482
544
7774
60
165
206
2784
160
569
580
8175
70
207
237
3346
170
620
593
9605
80
250
275
3887
180
611
635
10595
90
280
350
4402
190
631
682
11146
100
278
433
4863
200
688
713
11821
Table 22 Cloud Test - Experiment 04
- 72 -
COMPARISON OF MEANS OF EXPERIMENTS ON CLOUD TEST BED
NO. OF
USERS
EXP 01
EXP 02
EXP 03
EXP 04
NO. OF
USERS
EXP 01
EXP 02
EXP 03
EXP 04
0
0
0
0
0
10
2
3
121
233
110
26
32
2097
1614
20
2
3
315
497
120
31
32
2286
2187
30
2
3
517
519
130
36
36
2535
2845
40
4
4
704
713
140
35
41
2778
3081
50
5
6
869
914
150
42
41
2974
2933
60
7
14
1061
1121
160
47
41
3162
3108
70
11
12
1284
1374
170
46
46
3353
3606
80
15
17
1519
1558
180
47
46
3590
3947
90
17
19
1733
1795
190
55
52
3765
4153
100
23
18
1914
1985
200
59
54
4003
4407
Table 23 Cloud Test - Comparison Of Mean Response Time
F PERFORMANCE GAIN AFTER UPGRADE
PERFORMANCE GAIN IN PERCENTAGE (LIGHT LOAD)
NO. OF
USERS
WEB+DB SEPARATE
SERVER REMOTE CDB
UPGRADE - CRYPTDB
SERVER
UPGRADE - CRYPTDB
SERVER + DATABASE
UPGRADE -
WEBSERVER CRYPTDB
SERVER DATABASE
0
0
0
0
0
50
0.00
-5.84
-29.93
-41.61
100
0.00
6.83
-24.46
-30.58
150
0.00
-28.42
-25.52
-32.99
200
0.00
-23.26
-27.33
-30.38
250
0.00
-6.81
-20.31
-18.40
300
0.00
-9.40
-17.67
-10.54
350
0.00
-13.07
-18.40
-13.07
400
0.00
-10.20
-19.76
-12.85
450
0.00
3.48
-7.96
-13.35
500
0.00
1.04
-11.24
-14.72
AVG
0.00
-7.79
-18.42
-19.86
Table 24 Performance Gain - Light Load
- 73 -
PERFORMANCE GAIN IN PERCENTAGE (MEDIUM LOAD)
NO. OF
USERS
WEB+DB SEPARATE
SERVER REMOTE CDB
UPGRADE - CRYPTDB
SERVER
UPGRADE - CRYPTDB
SERVER + DATABASE
UPGRADE -
WEBSERVER CRYPTDB
SERVER DATABASE
0
0
0
0
0
50
0.00
-5.42
-11.45
-43.37
100
0.00
-33.03
-41.34
-46.88
150
0.00
-18.38
-12.50
-30.88
200
0.00
-5.47
-13.32
-11.08
250
0.00
5.12
4.66
-21.89
300
0.00
-2.88
-14.39
-23.12
350
0.00
7.94
-5.08
-9.79
400
0.00
9.90
-2.29
-11.21
450
0.00
8.78
-3.94
-9.90
500
0.00
6.61
-4.28
-10.29
AVG
0.00
-2.44
-9.45
-19.86
Table 25 Performance Gain - Medium Load
PERFORMANCE GAIN IN PERCENTAGE (HEAVY LOAD)
NO. OF
USERS
WEB+DB SEPARATE
SERVER REMOTE CDB
UPGRADE - CRYPTDB
SERVER
UPGRADE - CRYPTDB
SERVER + DATABASE
UPGRADE -
WEBSERVER CRYPTDB
SERVER DATABASE
0
0
0
0
0
50
0.00
-1.52
-3.90
2.95
100
0.00
-3.24
-4.90
0.65
150
0.00
-1.71
-2.87
2.05
200
0.00
-13.11
-13.26
-16.48
250
0.00
-8.69
-8.24
-6.71
300
0.00
-14.35
-12.81
-12.75
350
0.00
-11.65
-7.88
-7.05
400
0.00
-10.80
-5.65
-5.04
450
0.00
-14.25
-12.25
-10.38
500
0.00
-7.87
-3.05
-4.12
AVG
0.00
-7.93
-6.80
-5.17
Table 26 Performance Gain - Heavy Load
- 74 -
G PERFORMANCE GAIN vs COST OF UPGRADE
LIGHT LOAD
INCREASE IN
EXPERIMENT-04
UPGRADE-CRYPTDB
SERVER
UPGRADE-CRYPTDB
& DATABASE
SERVER
UPGRADE-CRYPTDB,
DATABASE & WEB
SERVER
0-500 USERS
0.00
-8.56
-20.26
-21.85
0-150 USERS
0.00
-9.14
-26.64
-35.06
COST
0%
33%
67%
100%
Table 27 Performance Gain vs Cost (Light Load)
MEDIUM LOAD
INCREASE IN
EXPERIMENT-04
UPGRADE-CRYPTDB
SERVER
UPGRADE-CRYPTDB
& DATABASE
SERVER
UPGRADE-CRYPTDB,
DATABASE & WEB
SERVER
0-500 USERS
0.00
-2.68
-10.39
-21.84
0-150 USERS
0.00
-18.94
-21.76
-40.38
COST
0%
33%
67%
100%
Table 28 Performance Gain vs Cost (Medium Load)
HEAVY LOAD
INCREASE IN
EXPERIMENT-04
UPGRADE-CRYPTDB
SERVER
UPGRADE-CRYPTDB
& DATABASE
SERVER
UPGRADE-CRYPTDB,
DATABASE & WEB
SERVER
0-500 USERS
0.00
-8.72
-7.48
-5.69
0-150 USERS
0.00
-2.16
-3.89
1.88
COST
0%
33%
67%
100%
Table 29 Performance Gain vs Cost (Heavy Load)
- 75 -
REFERENCES
[1]
C. Manning, "ChlsseaManning," [Online]. Available:
http://www.chelseamanning.org. [Accessed 25 12 2014].
[2]
wikipedia, "Chelsea Manning," [Online]. Available:
http://en.wikipedia.org/wiki/Chelsea_Manning. [Accessed 25 12 2014].
[3]
G. Greenwald, No Place To Hide : Edward Snowden, the NSA and the surveillance
state, 1st. Ed. ed., Metropolitan Books, 2014, p. 272.
[4]
wikipedia, "Edward Snowden," [Online]. Available:
http://en.wikipedia.org/wiki/Edward_Snowden. [Accessed 25 12 2014].
[5]
O. T. Alliance, "2014 Data Protection & Breach Readiness Guide - Overview," 2014.
[Online]. Available: https://www.otalliance.org/resources/2014-data-protection-
breach-readiness-guide-overview.
[6]
R. A. Popa, Building practical systems that compute on encrypted data, Massachusetts,
2014.
[7]
O. T. Alliance, "Security and privacy enhancing best practices," 21 01 2015. [Online].
Available:
https://www.otalliance.org/system/files/files/resource/documents/ota2015-
bestpractices.pdf. [Accessed 25 01 2015].
[8]
privacyrights.org, "Chronology of Data Breaches," Privacy Rights Clearinghouse,
[Online]. Available: http://www.privacyrights.org/data-breach. [Accessed 25 01
2015].
- 76 -
[9]
S. Tu, M. F. Kaashoek, S. Madden and N. Zeldovich, "Processing analytical queries
over encrypted data," in Proceedings of the 39th International Conference On Very Large
Data Bases (VLDB), Trento, 2013.
[10]
Wikipedia, "PRISM (surveillance program)," wikipedia.org, [Online]. Available:
http://en.wikipedia.org/wiki/PRISM_%28surveillance_program%29. [Accessed
31 12 2014].
[11]
R. A. Popa, C. M. S. Redfield, N. Zeldovich and H. Balakrishnan, "CryptDB:
Protecting Confidentiality with Encrypted Query Processing," in 23rd ACM
Symposium on Operating Systems Principles (SOSP), Cascais, Portugal, 2011.
[12]
R. L. Rivest, L. Adleman and M. L. Dertouzos, "ON DATA BANKS AND PRIVACY
HOMOMORPHISMS," Foundations of Secure Computation, pp. 168-179, 1978.
[13]
D. X. Song, D. Wagner and A. Perrig, "Practical Techniques for Searches on
Encrypted Data," in 2000 IEEE Symposium on Security and Privacy, Washington, DC,
USA, 2000.
[14]
O. Goldreich, Foundations of Cryptography : Volume II Basic Applications,
Cambridge University Press, 2004.
[15]
C. Gentry, "Fully Homomorphic Encryption Using Ideal Lattices," in 41st Annual
ACM symposium on Theory of computing, New York, NY, USA, 2009.
[16]
M. v. Dijk, C. Gentry, S. Halevi and V. Vaikuntanathan, "Fully Homomorphic
Encryption over the Integers," in Advances in Cryptology EUROCRYPT 2010, 2010.
[17]
D. Stehlé and R. Steinfeld, "Faster Fully Homomorphic Encryption," in Advances in
Cryptology - ASIACRYPT 2010, 2010.
[18]
Z. Brakerski and V. Vaikuntanathan, "Efficient Fully Homomorphic Encryption
from (Standard) LWE," in Proceedings of the 2011 IEEE 52nd Annual Symposium on
Foundations of Computer Science, 2011.
- 77 -
[19]
Z. Brakerski and V. Vaikuntanathan, "Fully Homomorphic Encryption from Ring-
LWE and Security for Key Dependent Messages," in Advances in Cryptology
CRYPTO 2011, 2011.
[20]
D. Boneh, E.-J. Goh and K. Nissim, "Evaluating 2-DNF Formulas on Ciphertexts," in
Second Theory of Cryptography Conference, 2005.
[21]
T. ElGamal, "A Public Key Cryptosystem and a Signature Scheme Based on Discrete
Logarithms," in Proceedings of CRYPTO 84, 1985.
[22]
P. Paillier, "Public-key cryptosystems based on composite degree residuosity," in
International Conference on the Theory and Application of Cryptographic Techniques ,
1999.
[23]
S. Goldwasser and S. Micali, "Probabilistic encryption & how to play mental poker
keeping secret all partial information," in Proceedings of the fourteenth annual ACM
symposium on Theory of computing , 1982.
[24]
A. Sahai and B. Waters, "Fuzzy Identity-Based Encryption," in Advances in Cryptology
EUROCRYPT 2005, 2005.
[25]
V. Goyal, O. Pandey, A. Sahai and B. Waters, "Attribute-based encryption for fine-
grained access control of encrypted data," in 13th ACM conference on Computer and
communications security, 2006.
[26]
J. Katz, A. Sahai and B. Waters, "Predicate Encryption Supporting Disjunctions,
Polynomial Equations, and Inner Products," in 27th Annual International Conference
on the Theory and Applications of Cryptographic Techniques, 2008.
[27]
D. Boneh, A. Sahai and B. Waters, "Functional encryption: Definitions and
challenges," in 8th Theory of Cryptography Conference, 2011.
[28]
R. A. Popa, E. Stark, J. Helfer, S. Valdez, N. Zeldovich, M. F. Kaashoek and H.
Balakrishnan, "Building web applications on top of encrypted data using Mylar," in
11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14),
Seattle, 2014.
- 78 -
[29]
R. A. Popa and N. Zeldovich, "Multi-Key Searchable Encryption," Cryptology ePrint
Archive, Report 2013/508, 2013. [Online]. Available: http://eprint.iacr.org.
[30]
B. A. Forouzan, Cryptography & Network Security 2E, 2nd Edition ed., McGraw-
Hill Education, 2010.
[31]
A. Boldyreva, N. Chenette, Y. Lee and A. O’Neill, "Order-Preserving Symmetric
Encryption," in 28th Annual International Conference on the Theory and Applications of
Cryptographic Techniques, Cologne, Germany, 2009.
[32]
OpenEMR, "Openemr Project," 17 12 2014. [Online]. Available: http://www.open-
emr.org/.
[33]
Apache, "Apache," 17 12 2014. [Online]. Available: http://www.apache.org/.
[34]
Oracle, "MySQL :: The world's most popular open source database," 17 12 2014.
[Online]. Available: http://www.mysql.com. [Accessed 17 12 2014].
[35]
apache, "Apache jMeter," 17 12 2014. [Online]. Available:
http://jmeter.apache.org/.
[36]
R. A. Popa, C. Redfield, S. Tu, H. Balakrishnan, F. Kaashoek, S. Madden, N.
Zeldovich and A. Burrows, "cryptdb," 17 12 2014. [Online]. Available:
http://css.csail.mit.edu/cryptdb/.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Let ψ be a 2-DNF formula on boolean variables x 1,...,x n ∈ {0,1}. We present a homomorphic public key encryption scheme that allows the public evaluation of ψ given an encryption of the variables x 1,...,x n . In other words, given the encryption of the bits x 1,...,x n , anyone can create the encryption of ψ(x 1,...,x n ). More generally, we can evaluate quadratic multi-variate polynomials on ciphertexts provided the resulting value falls within a small set. We present a number of applications of the system: 1 In a database of size n, the total communication in the basic step of the Kushilevitz-Ostrovsky PIR protocol is reduced from \(\sqrt{n}\) to \(\sqrt[3]{n}\). 2 An efficient election system based on homomorphic encryption where voters do not need to include non-interactive zero knowledge proofs that their ballots are valid. The election system is proved secure without random oracles but still efficient. 3 A protocol for universally verifiable computation.
Article
Theft of confidential data is prevalent. In most applications, confidential data is stored at servers. Thus, existing systems naturally try to prevent adversaries from compromising these servers. However, experience has shown that adversaries still find a way to break in and steal the data. This dissertation shows how to protect data confidentiality even when attackers get access to all the data stored on servers. We achieve this protection through a new approach to building secure systems: building practical systems that compute on encrypted data, without access to the decryption key. In this setting, we designed and built a database system (CryptDB), a web application platform (Mylar), and two mobile systems, as well as developed new cryptographic schemes for them. We showed that these systems support a wide range of applications with low overhead. The work in this thesis has already had impact: Google uses CryptDB's design for their new Encrypted BigQuery service, and a medical application of Boston's Newton-Wellesley hospital is secured with Mylar.
Conference Paper
MONOMI is a system for securely executing analytical workloads over sensitive data on an untrusted database server. MONOMI works by encrypting the entire database and running queries over the encrypted data. MONOMI introduces split client/server query execution, which can execute arbitrarily complex queries over encrypted data, as well as several techniques that improve performance for such workloads, including per-row precomputation, space-efficient encryption, grouped homomorphic addition, and pre-filtering. Since these optimizations are good for some queries but not others, MONOMI introduces a designer for choosing an efficient physical design at the server for a given workload, and a planner to choose an efficient execution plan for a given query at runtime. A prototype of MONOMI running on top of Postgres can execute most of the queries from the TPC-H benchmark with a median overhead of only 1.24× (ranging from 1.03×to 2.33×) compared to an un-encrypted Postgres database where a compromised server would reveal all data.
Conference Paper
As more sensitive data is shared and stored by third-party sites on the Internet, there will be a need to encrypt data stored at these sites. One drawback of encrypting data, is that it can be selectively shared only at a coarse-grained level (i.e., giving another party your private key). We develop a new cryptosystem for flne-grained sharing of encrypted data that we call Key-Policy Attribute-Based Encryption (KP-ABE). In our cryptosystem, ciphertexts are labeled with sets of attributes and private keys are associated with access structures that control which ciphertexts a user is able to decrypt. We demonstrate the applicability of our construction to sharing of audit-log information and broadcast encryption. Our construction supports delegation of private keys which subsumes Hierarchical Identity-Based Encryption (HIBE).
Conference Paper
We propose a fully homomorphic encryption scheme - i.e., a scheme that allows one to evaluate circuits over encrypted data without being able to decrypt. Our solution comes in three steps. First, we provide a general result - that, to construct an encryption scheme that permits evaluation of arbitrary circuits, it suffices to construct an encryption scheme that can evaluate (slightly augmented versions of) its own decryption circuit; we call a scheme that can evaluate its (augmented) decryption circuit bootstrappable. Next, we describe a public key encryption scheme using ideal lattices that is almost bootstrappable. Lattice-based cryptosystems typically have decryption algorithms with low circuit complexity, often dominated by an inner product computation that is in NC1. Also, ideal lattices provide both additive and multiplicative homomorphisms (modulo a public-key ideal in a polynomial ring that is represented as a lattice), as needed to evaluate general circuits. Unfortunately, our initial scheme is not quite bootstrap- pable - i.e., the depth that the scheme can correctly evalu- ate can be logarithmic in the lattice dimension, just like the depth of the decryption circuit, but the latter is greater than the former. In the final step, we show how to modify the scheme to reduce the depth of the decryption circuit, and thereby obtain a bootstrappable encryption scheme, with- out reducing the depth that the scheme can evaluate. Ab- stractly, we accomplish this by enabling the encrypter to start the decryption process, leaving less work for the de- crypter, much like the server leaves less work for the de- crypter in a server-aided cryptosystem. Categories and Subject Descriptors: E.3 (Data En-
Conference Paper
We present a fully homomorphic encryption scheme that is based solely on the(standard) learning with errors (LWE) assumption. Applying known results on LWE, the security of our scheme is based on the worst-case hardness of ``short vector problems'' on arbitrary lattices. Our construction improves on previous works in two aspects:\begin{enumerate}\item We show that ``somewhat homomorphic'' encryption can be based on LWE, using a new {\em re-linearization} technique. In contrast, all previous schemes relied on complexity assumptions related to ideals in various rings. \item We deviate from the "squashing paradigm'' used in all previous works. We introduce a new {\em dimension-modulus reduction} technique, which shortens the cipher texts and reduces the decryption complexity of our scheme, {\em without introducing additional assumptions}. \end{enumerate}Our scheme has very short cipher texts and we therefore use it to construct an asymptotically efficient LWE-based single-server private information retrieval (PIR) protocol. The communication complexity of our protocol (in the public-key model) is $k \cdot \polylog(k)+\log \dbs$ bits per single-bit query (here, $k$ is a security parameter).
Conference Paper
We present a somewhat homomorphic encryption scheme that is both very simple to describe and analyze, and whose security (quantumly) reduces to the worst-case hardness of problems on ideal lattices. We then transform it into a fully homomorphic encryption scheme using standard “squashing” and “bootstrapping” techniques introduced by Gentry (STOC 2009). One of the obstacles in going from “somewhat” to full homomorphism is the requirement that the somewhat homomorphic scheme be circular secure, namely, the scheme can be used to securely encrypt its own secret key. For all known somewhat homomorphic encryption schemes, this requirement was not known to be achievable under any cryptographic assumption, and had to be explicitly assumed. We take a step forward towards removing this additional assumption by proving that our scheme is in fact secure when encrypting polynomial functions of the secret key. Our scheme is based on the ring learning with errors (RLWE) assumption that was recently introduced by Lyubashevsky, Peikert and Regev (Eurocrypt 2010). The RLWE assumption is reducible to worst-case problems on ideal lattices, and allows us to completely abstract out the lattice interpretation, resulting in an extremely simple scheme. For example, our secret key is s, and our public key is (a,b = as + 2e), where s,a,e are all degree (n − 1) integer polynomials whose coefficients are independently drawn from easy to sample distributions.
Conference Paper
We initiate the formal study of functional encryption by giving precise definitions of the concept and its security. Roughly speaking, functional encryption supports restricted secret keys that enable a key holder to learn a specific function of encrypted data, but learn nothing else about the data. For example, given an encrypted program the secret key may enable the key holder to learn the output of the program on a specific input without learning anything else about the program. We show that defining security for functional encryption is non-trivial. First, we show that a natural game-based definition is inadequate for some functionalities. We then present a natural simulation-based definition and show that it (provably) cannot be satisfied in the standard model, but can be satisfied in the random oracle model. We show how to map many existing concepts to our formalization of functional encryption and conclude with several interesting open problems in this young area.