ArticlePDF Available

Enhanced Privacy Preserving Model for Data Using (α, β, k)-Anonymity Model and Lossy join

November 2015

November 2015
13(11):60-67

Authors:

Abou-El-Ela Abdou Hussien

Shaqra University

Nagy Ramadan

Cairo University

This paper aims to provide enhancements in the privacy preserving model that was published in our previous paper entitled "An Effective Privacy Preserving Model for Databases Using (α, β, k) - Anonymity Model and Lossy Join" [1]. The previous paper includes a model that maintains the privacy of the multiple sensitive data after the publication of the data in two tables: one for QI-tuples and the other for sensitive attributes. This model used the connecting numbers which depend on one of the sensitive attributes as in lossy join technique. The authors found that in some cases there is a problem may arise with retrieving the exact frequency for any of the rest sensitive attributes if they are not included, as a set of attributes in the same tuple in sensitive attributes table. In other words, the frequency of any one of the rest sensitive attributes is different from the existing frequency of the same attribute in original table especially if the researcher doesn’t use all sensitive attributes in the same tuple together as a set. This problem may affect the ability of researchers to utilize the data and consequently affect the research accuracy. This paper proposed a solution for this problem by adding the frequency details in published sensitive data table for the sensitive attributes that are not used in making connecting numbers. The solution will increase the data utility and improve the research accuracy.

Previous Proposed Technique Architecture

…

Figures - uploaded by Abou-El-Ela Abdou Hussien

Content may be subject to copyright.

Content uploaded by Abou-El-Ela Abdou Hussien

Content may be subject to copyright.

International Journal of Computer Science and Information Security (IJCSIS),

Vol. 13, No. 11, November 2015

http://sites.google.com/site/ijcsis/

ISSN 1947-5500



Abstract—This paper aims to provide enhancements in the

privacy preserving model that was published in our previous

paper entitled "An Effective Privacy Preserving Model for

Databases Using (α, β, k) - Anonymity Model and Lossy Join"

[1]. The previous paper includes a model that maintains the

privacy of the multiple sensitive data after the publication of

the data in two tables: one for QI-tuples and the other for

sensitive attributes. This model used the connecting numbers

which depend on one of the sensitive attributes as in lossy join

technique. The authors found that in some cases there is a

problem may arise with retrieving the exact frequency for any

of the rest sensitive attributes if they are not included, as a set

of attributes in the same tuple in sensitive attributes table. In

other words, the frequency of any one of the rest sensitive

attributes is different from the existing frequency of the same

attribute in original table especially if the researcher doesn’t

use all sensitive attributes in the same tuple together as a set.

This problem may affect the ability of researchers to utilize

the data and consequently affect the research accuracy. This

paper proposed a solution for this problem by adding the

frequency details in published sensitive data table for the

sensitive attributes that are not used in making connecting

numbers. The solution will increase the data utility and

improve the research accuracy.

Index Terms—Privacy Preserving Model, Anatomy

Technique, lossy join, Multiple Sensitive Attributes,

Connecting Numbers.

I. INTRODUCTION

Data mining is an increasingly important technology for

extracting useful knowledge hidden in huge collections of data

[2-6]. Data Mining also possible defined as an analysis

process of large quantities of data in order to discover

meaningful patterns and rules. There are, however, negative

social perceptions about data mining, among which potential

privacy violation and potential discrimination [7, 8]. Any data

mining model generally assumes that the underlying data is

freely accessible. The former is an unintentional or deliberate

disclosure of a user profile or activity data as part of the output

of a data mining algorithm or as a result of data sharing. Even

removing identifiers data is not secured, and causes linking

attacks [9]. For this reason, privacy preserving data mining has

been introduced to protect individual privacy. Privacy

preserving data mining (PPDM) has become more and more

important because it allows sharing of privacy sensitive

attributes for analytical purposes.

There are many privacy techniques were developed, one of

the most common is k-anonymity which is the emerging

concept for the protection of released data [10-15]. Anonymity

typically refers to the state on individual’s personal identity or

personally identifiable information, being publically unknown.

When released information linked with confidential table may

cause data disclosures. Anonymity model introduced to

control linking attack. K-anonymity model suggests to convert

identifiers (Quasi identifiers, who are responsible for linking

attack) in such a manner that adversary doesn’t infer the

sensitive attributes related to them. On the other hand, it is

difficult for a data publisher to generate anonymous table,

when multiple sensitive attributes are present in data set

because concentrating to protect one sensitive attribute may

cause disclosure of identity due to another one [14]. An

attempt to solve that problem was introduced in [1] that

includes a proposed model that maintains the privacy of the

multiple sensitive attributes. This previous model solves this

problem by publication data in two tables: one for QI-tuples

and the other for sensitive attributes. It uses the connecting

numbers which depend on one of the sensitive attributes. In

the previous proposed model in [1], there is a problem may

arise if researcher intended to know the frequency of any one

of the rest sensitive attributes. The authors found that this

frequency is different from that in original table especially if

the researcher doesn’t treat all together as a set. Therefore,

authors proposed an enhanced new model to avoid this

problem using the frequency details in published sensitive

attributes table. This frequency details enable researcher to

know in exact the correct frequency number for each of the

rest sensitive attributes as explained later in this paper.

In the next section, the authors discuss multiple sensitive

attributes. Section III presents a previous attempt of privacy

preserving for databases, (α, β, k)-anonymity model, and

applies lossy join with k-anonymity techniques. Section IV

presents privacy preserving using anatomy technique. Section

V introduces implementation of the enhanced proposed model.

Enhanced Privacy Preserving Model for Data

Using (α, β, k)-Anonymity Model and Lossy join

Abou_el_ela Abdo Hussien1, Nagy Ramadan Darwish2

1Department of Computer Science, Shaqra University, KSA,

abo_el_ela_2004@yahoo.com

2Department of Computer and Information Sciences, Institute of Statistical Studies and Research, Cairo

University,

drnagyd@yahoo.com

International Journal of Computer Science and Information Security (IJCSIS),

Vol. 13, No. 11, November 2015

http://sites.google.com/site/ijcsis/

ISSN 1947-5500

II. MULTIPLE SENSITIVE ATTRIBUTES

Sensitive attribute is an attribute whose value for some

particular individual must be kept secret from people who

have no direct access to the original data [1, 12]. Data

publisher needs to prevent privacy disclosure which means

someone can simply attack the published table "T" and at least

know the individuals' confidential information like knowing

that he could suffer from some kinds of dangerous disease

[13]. Information disclosure could be of three types as follows

[1, 14]:

 Attribute disclosure: sensitive attribute information of

an individual is disclosed.

 Identity disclosure: an individual is linked to a

particular record in the published data.

 Membership disclosure: information about whether

an individual's record is in the published data or not

is disclosed.

K-anonymity model was introduced to protect sensitive

attributes from interlopers. Therefore, if an adversary wants to

search an individual's identity and has knowledge about quasi-

identifiers, he should find k-1 records that satisfy quasi-

identifiers [14]. On the other hand, when multiple sensitive

attributes are present in records, data publishers have to face a

big problem in maintaining privacy for all these attributes

together. Table I shows 4-anonymous inpatient microdata and

Table II shows a description of dataset [14, 15]. Table II

includes the sensitive attributes: "Medical Status",

"Occupation", and "Annual Income". When a data publisher

concentrates to protect one sensitive attribute may cause

disclosure of identity due to another one [14]. Therefore, we

need a model to control all sensitive attributes together.

III. A PREVIOUS ATTEMPT OF PRIVACY PRESERVING FOR

DATABASES

In this section, the authors present the previous paper

entitled "An Effective Privacy Preserving Model for

Databases Using (α, β, k) - Anonymity Model and Lossy Join"

[1]. The previous paper introduced a model that solves the

problem of maintaining the multiple sensitive attributes

privacy introduced in section II through the publication of data

in two tables: one for QI-tuples and the other for sensitive

attributes. In the following sub-sections, the authors will

present the main problem definition of previous proposed

model in [1], (α, β, k)-anonymity model and the previous

proposed algorithm for using k-anonymous model with lossy

join which helps to solve protecting multiple sensitive

attributes privacy problem [1].

A. The Previous Proposed Model Problem Definition:

The identities and accurately QI-attributes values of all

individuals could be mastered by an attacker, using

background knowledge [16]. This background knowledge can

be detected from external tables and be contained in an

equivalent class. Our previous proposed model intended to

solve this problem with multiple sensitive attributes that can

be explained using the following example:

TABLE I

4-Anonymous Inpatient Microdata

Ser. No

NONSENSITIVE

SENSITIVE

Zip Code

Age

Nationality

Medical

Status

130***

>30

Heart

Disease

130***

>30

Heart

Disease

130***

>30

HIV

130***

>30

HIV

1485**

≥40

Cancer

1485**

≥40

Heart

Disease

1485**

≥40

HIV

1485**

≥40

HIV

130***

Cancer

130***

Cancer

130***

Cancer

130***

Cancer

TABLE II

Classification of Attributes

Ser. No

ATTRIBUTE

TYPE

ZIPCODE

NON-SENSITIVE

AGE

NON-SENSITIVE

NATIONALITY

NON-SENSITIVE

MEDICAL_STATUS

SENSITIVE

OCCUPATION

SENSITIVE

ANNUAL_INCOME

SENSITIVE

 Assume the data in Table III need to be published by

publishers, such as a hospital or an insurance company.

Both disease and salary are sensitive attributes.

 Table IV is an anonymous data table of Table III.

Although disease attributes and the salary attributes

both conform to 3-diversity rules in Table IV, it cannot

prevent the attack of the background knowledge as

explained in the following cases:

o If an attacker knows information about someone

named "Ali" is in the second QI-group, and knows

that salary of "Ali" is not "2000" according he/she

has mastered background knowledge, and then the

attacker can infer that "Ali" suffered from

"Catatonia".

o If an attacker knows information about someone

named "Iman" is in the first QI-group and knows

that salary of "Iman" is not "6000", and then the

attacker can infer that "Iman" suffered from

"Depression".

o Although the sensitive attributes conform to L-

diversity [17], privacy information still is leaked.

The main reason is that there is a less diversity

between multiple sensitive attributes.

The previous proposed model in [1] solved this problem,

maintaining the privacy of the data to a large extent, as

explained in the following subsections.

International Journal of Computer Science and Information Security (IJCSIS),

Vol. 13, No. 11, November 2015

http://sites.google.com/site/ijcsis/

ISSN 1947-5500

TABLE III

Microdata

Sex

Age

Zip code

Salary(S1)

Disease(S2)

66212

6000

Headache

66251

4000

Depression

66253

2000

Depression

66252

6000

Paranoia

63223

5000

Catatonia

65262

2000

Paranoia

63232

6000

Catatonia

65261

2000

Insomnia

TABLE IV

Anonymized Table

Sex

Age

Zip code

Salary(S1)

Disease(S2)

[25-30]

66***

6000

Headache

[25-30]

66***

4000

Depression

[25-30]

66***

2000

Depression

[25-30]

66***

6000

Paranoia

[35-40]

6****

5000

Catatonia

[35-40]

6****

2000

Paranoia

[35-40]

6****

6000

Catatonia

[35-40]

6****

2000

Insomnia

B. (α, β, k)-Anonymity Model

Let a Table "T" that contains a set of attributes (A1, ...,An).

This attributes could be divided into two separate categories.

First category represents non-sensitive attributes (Q1, ... ,Qm)

and the second category represents sensitive attributes (S1, ...

.Si). The number of tuples is QIn in QI-group [18]. The number

of distinct values of sensitive attribute Si is nSi, and the

corresponding number of distinct sensitive attribute values is

nS'i in Si of all the same sensitive attribute values in Si-l. "T" is

said to satisfy (α, β, k) anonymity if and only if:

1) T satisfies k-anonymity,

2) the number of distinct values for each sensitive

attribute occur at least β times (2≤ β ≤k)within the

same QI-group, and

3) α = nSi - nS'i ≠1 in each QI-group of tuples.

To illustrate this anonymity approach, we analyze the data

from Table IV that satisfies 4-anonymity with respect to Sex,

Age and Zip code and includes two QI-groups.

 The first group has three different diseases and three

different salaries,

 The second group also has three different diseases

and three different salaries; Therefore β=3.

 In the first group, nSl=nS2=3, nS'2=2 because the

corresponding distinct disease attribute values are

"Headache" and "Paranoia" of the same salary

attribute values {6000, 6000} in the salary attributes.

 Thus, α = nS2 - nS' 2= 3-2=1, it is not satisfies (α, β,

k)-anonymity.

From previous analyses we know that Table IV will lead to

a leakage of privacy information, that is, if α =1, it will cause a

leakage if an attacker has a background knowledge. Previous

proposed model in [1] was adopted to solve the above

problem.

C. Applying Lossy Join with K-anonymity Technique

This section introduces lossy join technique explaining how

it is useful to conceal sensitive attributes and how to apply it

with (α, β, k)-Anonymity.

1) The Lossy Join Technique

In recent work, lossy Join is useful in privacy preserving

data publishing [19]. The idea of this technique is that if two

tables with a join attribute are published, the join of the two

tables can be lossy and this lossy Join, helps to conceal the

private information. The idea of lossy join is used to derive a

new mechanism for achieving a similar privacy preservation

target.

 Let us have a look at an example in Table V, A (0.5, 2)-

anonymization. From this table, we can generate a Temp

table as shown in Table VI.

 For each equivalence class "E" in the anonymized table,

author assigns a unique identifier (ID) to "E" and also to

all tuples in "E".

 Then, author attaches the correspondence (ID) to each

tuple in the original raw table and forms a new table

"Temp", Table VI.

 From the Temp table, we can generate two separate

tables, Tables VII (a) and VII (b).

 The two tables share the attribute of ClassID.

 If we join these two tables by the ClassID, it is easy to

see that the join is lossy and it is not possible to derive

the Temp table after the join.

 The result of joining the two tables is given in Table

VIII.

TABLE V

A (0.5, 2)-anonymization Table

Job

Birth

Postcode

Disease

Clerk

1975

4350

HIV

manager

1955

4350

flu

clerk

1955

5432

flu

factory worker

1955

5432

fever

factory worker

1975

4350

flu

technical supporter

1940

4350

fever

TABLE VI

Temp Table

Job

Birth

Postcode

Disease

ClassID

Clerk

1975

4350

HIV

manager

1955

4350

flu

clerk

1955

5432

flu

factory worker

1955

5432

fever

factory worker

1975

4350

flu

technical supporter

1940

4350

fever

TABLE VII (a)

NSS Table

Job

Birth

Postcode

ClassID

Clerk

1975

4350

manager

1955

4350

Clerk

1955

5432

factory worker

1955

5432

factory worker

1975

4350

technical supporter

1940

4350

International Journal of Computer Science and Information Security (IJCSIS),

Vol. 13, No. 11, November 2015

http://sites.google.com/site/ijcsis/

ISSN 1947-5500

TABLE VII (b)

SS Table

ClassID

Disease

HIV

Flu

Fever

Flu

Fever

TABLE VIII

Joining the Two Tables (7-a) & (7-b)

Job

Birth

Postcode

Disease

Class

clerk

1975

4350

HIV

manager

1955

4350

HIV

clerk

1975

4350

flu

manager

1955

4350

Flu

clerk

1955

5432

Flu

factory

worker

1955

5432

Flu

clerk

1955

5432

fever

factory

worker

1955

5432

fever

factory

worker

1975

4350

flu

technical

worker

1940

4350

flu

factory

worker

1975

4350

fever

technical

worker

1940

4350

fever

 From the lossy join, each individual is linked to at least 2

values in the sensitive attribute. Therefore, the required

privacy of individual can be guaranteed.

 In the joined table, for each individual, there are at least

2 individuals that are linked to the same bag "B" of

sensitive attributes values, such that in terms of the

sensitive values, they are not distinguishable.

 The first record in the raw table (QID= (clerk, 1975,

4350)) is linked to bag {HIV, flu}.

 The second individual (QID = (manager, 1955, 4350)) is

also linked to the same bag "B" of sensitive attributes

values.

 This is the goal of k-anonymity for the protection of

sensitive attributes values.

2) Applying Lossy Join Approach with (α, β, k)-

Anonymity Model

Lossy Join Technique is adopted to solve above problem in

sub-section III.A by previous proposed model in paper [1].

The author gives a set different number for each salary as

shown in Table IX in "Connecting Numbers" column, and

then uses these numbers to build both tables as shown in Table

X and Table XI. By joining the two Tables X and XI with

these connecting numbers Table XII could be produced.

To illustrate this anonymity approach, we analyze the data

from Table XII that satisfies 7-anonymity with respect to

"Sex", "Age" and "Zip code" includes two QI-groups [1] as

follows:

 The first group has five different diseases and three

different salaries,

 The second group also has five different diseases and

three different salaries; Therefore, at least β=3.

TABLE IX

Anonymized Table with Connecting Numbers.

Connecting Numbers

Sex

Age

Zip

code

Salary(S1)

Disease(S2)

[25-

30]

66***

6000

Headache

1(for 6000)

[25-

30]

66***

4000

Depression

2(for 4000)

[25-

30]

66***

2000

Depression

3(for 2000)

[25-

30]

66***

6000

Paranoia

1(for 6000)

[35-

40]

6****

5000

Catatonia

4(for 5000)

[35-

40]

6****

2000

Paranoia

3(for 2000)

[35-

40]

6****

6000

Catatonia

1(for 6000)

[35-

40]

6****

2000

Insomnia

3(for 2000)

TABLE X

QI-Tuples with Connecting Numbers

Connecting

Numbers

Sex

Age

Zip code

[25-30]

66***

[25-30]

66***

[25-30]

66***

[25-30]

66***

[35-40]

6****

[35-40]

6****

[35-40]

6****

[35-40]

6****

TABLE XI

Sensitive attributes with Connecting numbers

Connecting

Numbers

Salary(S1)

Disease(S2)

6000

Headache

4000

Depression

2000

Depression

6000

Paranoia

5000

Catatonia

2000

Paranoia

6000

Catatonia

2000

Insomnia

 In the first group, nSl=3 nS2=5, nS'2=3 because the

corresponding distinct Disease attribute values are

"Headache", "Paranoia" and "Catanoia" of the same

salary attribute values {6000, 6000,6000} in the

salary attributes ,and

TABLE XII

Tuples with Sensitive using Connecting Numbers

Connecting Numbers

Sex

Age

Zip

code

Salary(S1)

Disease(S2)

[25-

30]

66***

6000

Headache

[25-

30]

66***

6000

Paranoia

[25-

30]

66***

6000

Catatonia

[25-

30]

66***

4000

Depression

[25-

30]

66***

2000

Paranoia

[25-

30]

66***

2000

Depression

[25-

30]

66***

2000

Insomnia

[35-

40]

6****

2000

Paranoia

[35-

40]

6****

2000

Depression

T10

[35-

40]

6****

2000

Insomnia

T11

[35-

40]

6****

5000

Catatonia

T12

[35-

40]

6****

6000

Headache

T13

[35-

40]

6****

6000

Paranoia

T14

[35-

40]

6****

6000

Catatonia

International Journal of Computer Science and Information Security (IJCSIS),

Vol. 13, No. 11, November 2015

http://sites.google.com/site/ijcsis/

ISSN 1947-5500

o The corresponding distinct Disease attribute

values are "Depression", "Paranoia" and

"Insomina" of the same Salary attribute values

{2000, 2000, 2000} in the Salary attributes.

o Thus, α =nS2 - nS' 2=5-3=2, it satisfies (α, β, k)-

anonymity.

 In the second group, nSl=3 nS2=5, nS'2=3 because the

corresponding distinct Disease attribute values are

"Headache", "Paranoia" and "Catanoia" of the same

salary attribute values {6000, 6000, 6000} in the

Salary attributes ,and

o The corresponding distinct Disease attribute

values are "Depression", "Paranoia" and

"Insomina" of the same salary attribute values

{2000, 2000, 2000} in the Salary attributes.

o Thus, α =nS2 - nS'2=5-3=2, it satisfies (α, β, k)-

anonymity.

 Figure I represents (α, β, K) test Architecture and

Figure II represents the previous proposed model

architecture.

IV. PRIVACY PRESERVING USING ANATOMY TECHNIQUE

Anatomy technique aims to release two different tables

Quisi-Identifier (QI) attributes table and Sensitive Table (ST)

for Sensitive Attributes (SA) instead of publishing unique table

with the generalized values [20,21]. There is no need to

modify the original table because anatomy releases all QIs and

ST directly in two separate tables, which met L-diversity

privacy requirement [20]. Anatomy technique has been

proposed to overcome the disadvantages of generalization

which often losses considerable information in the microdata.

Anatomy captures the exact QI-distribution and releases two

tables, a quasi-identifier table (QIT) and a sensitive table (ST),

which separate QI-values from sensitive attributes values. For

example, Tables XIV (a) and XIV (b) demonstrate the QIT

and ST obtained from the microdata Table XIII, respectively

[20]. The technique methodology could be explained as

follows:

 First, the microdata partitioned the records into

different QI-groups, based on a certain strategy. If

the reader following the grouping in Table XIII, he

will find that records from "1" to "4" are grouped into

QI-group number "1" and records from "5" to "8"

into QI-group number "2".

 Second, the quasi-identifier table (QIT) has been

created. Specifically, for each record in Table XIII,

the QIT (Table XIV (a)) includes all its exact QI-

values, together with its group membership in a new

column Group-ID. However, QIT doesn’t have any

disease value.

 Finally, it is possible to say that ST (Table XIV (b))

maintains the disease statistics of each QI-group.

The QIT doesn’t indicate the sensitive value of any record

which must randomly be guessed from the ST so anatomy

preserves privacy. To explain this, consider the adversary who

has the age "25" and Zip code "11500" of "Ali". Hence, from

the QIT (Table XIV (a)), the adversary knows that record "1"

belongs to "Ali", but doesn’t obtain any information about his

disease so far. Instead, s/he gets the id "1" of the QI-group

containing record "1". Judging from the ST (Table XIV (b)),

the adversary realizes that, among the "4" records in QI-group

"1", 50% of them are associated with "pneumonia" (or

"dyspepsia") in the microdata. Note that s/he doesn’t gain any

additional information, regarding the exact diseases carried by

these records. Hence, s/he could only expect that "Ali" could

have contracted "pneumonia" (or "dyspepsia") with 50%

probability.

FIGURE II

Previous Proposed Technique Architecture

FIGURE I

(α, β, k) Test Archetecture

International Journal of Computer Science and Information Security (IJCSIS),

Vol. 13, No. 11, November 2015

http://sites.google.com/site/ijcsis/

ISSN 1947-5500

V. IMPLEMENTATION OF THE ENHANCED PROPOSED MODEL

The authors introduce the present problem definition with

an example that explains it and elucidates how the enhanced

proposed model solves this problem.

A. Present Problem Definition

In previous proposed model in [1] it is assumed that

researcher takes all sensitive attributes in the same tuple in the

sensitive table (ST) as a set. If researcher divides this tuple set

into separate sensitive attributes values he may face problem

especially if he needs to know the frequency of each separate

sensitive attribute (except those attribute that is used as a basis

for connecting numbers). The authors noticed this problem in

applying the previous proposed model as explained in the

following example:

When the authors take the two published Tables X and XI

mentioned before they noticed that if the researcher wants to

know the exact numbers of people who have the same

sensitive attribute he cannot reach the correct number as

explained in the next two cases:

 Case I: When researcher tries to calculate total numbers

of people who have the same salary set he could only get

the frequency number from Table XI by counting

frequency of each number in that table as explained in

Table XV (a). From Table XV (a) for example we find

that the salary set (6000) has frequency = 3 which equal

exactly to the same frequency in original Table IX (as

tuples "T1", "T4" & "T7"). We could apply the same

thing for all other salary sets which give the same

frequency as original Table IX. The process of finding

the frequency number is easy to be retrieved because the

salary set is used as a basis for connecting numbers

between the two published tables.

TABLE XIV (a)

The Quasi-identifier Table (QIT)

Row Number

Age

Sex

Zipcode

Group-ID

1(Ali)

11500

13200

59300

12700

54600

25200

7(Hoda)

25100

31000

TABLE XIV (b)

The Sensitive Table (ST)

Group-ID

Disease

Count

Dyspepsia

Pneumonia

Bronchitis

Flu

Gastritis

 Case II: When researcher tries to calculates total

numbers of people who have the same disease (for

example "Depression") he could return to Table XII to

know that "Depression" disease has connecting numbers

"2" and "3" and when researcher returns to Table X and

put "Depression" disease in front of the same connecting

numbers "2" and "3", he could build Table XV (b). From

Table XV (b) the researcher found that the total number

for people who are sick with "Depression" disease is "4"

people (explained with the same red color in Table XV

(b)). This number is different from the number in

original Table IX (as tuples "T2" & "T3") that equal only

"2", which consequently affects negatively with research

results accuracy.

From the previous display, it is clear that there is no

problem with the frequency of sensitive attribute used as a

basis for connecting numbers (Salary), but the problem arises

when we are trying to figure out the frequency of other

sensitive attribute (Disease).

TABLE XV (a)

Frequency of Each Salary Set According to Connecting Numbers in Table IX

Connecting

Number

Salary(S1)

Salary Set Frequency

6000

4000

2000

5000

TABLE XV (b)

People who are Sick with Depression Disease According to Connecting

Numbers

Connecting

Numbers

Disease

Sex

Age

Zip code

[25-30]

66***

[25-30]

66***

Depression

[25-30]

66***

Depression

[25-30]

66***

[35-40]

6****

[35-40]

6****

Depression

[35-40]

6****

[35-40]

6****

Depression

TABLE XIII

The Microdata

Tuple ID

Age

Sex

Zipcode

Disease

1(Ali)

11500

Pneumonia

13200

dyspepsia

59300

dyspepsia

12700

pneumonia

54600

Flu

25200

gastritis

7(Hoda)

25100

Flu

31000

bronchitis

International Journal of Computer Science and Information Security (IJCSIS),

Vol. 13, No. 11, November 2015

http://sites.google.com/site/ijcsis/

ISSN 1947-5500

B. The Enhanced Proposed Model

The authors provide solution for the present problem

explained in previous sub-section V.A by adding frequency

details column (as count column used in anatomy ST Table

XIV (b)). This column gives the exact number of sensitive

attribute frequency as in original table for the rest sensitive

attributes except that is used as a basis for connecting

numbers.

Frequency details column used only as a guide for researchers,

informing them about frequency number of sensitive attributes

(except that is used as a basis for connecting numbers in

original table), which reflects the accuracy of research results.

C. Applying the Proposed Solution

According to the enhanced proposed model, the solution

could be implemented as in the next two tables (XVI &XVII):

 First Table XVI represents QI-Tuples with connecting

numbers as the same published table (Table X) in [1]

without any changes.

 Second Table XVII represents sensitive attributes with

frequency details. In this table, the frequency details

presents only the frequency for sensitive attributes

(except that is used as a basis for connecting numbers in

original table) regardless of the connecting numbers or

category link salary. This table is different from sensitive

attributes table (Table XI) in [1] by adding this

frequency details column which helps researchers to

figure out the frequency number of all sensitive attributes

exactly. Proposed model architecture presented in Figure

III. TABLE XVI

QI-Tuples with Connecting Numbers

Connecting

Numbers

Sex

Age

Zip code

[25-30]

66***

[25-30]

66***

[25-30]

66***

[25-30]

66***

[35-40]

6****

[35-40]

6****

[35-40]

6****

[35-40]

6****

TABLE XVII

Sensitive Data with Connecting Numbers & Frequency Details

Connecting

Numbers

Frequency

Details

For

Disease(S2)

Salary(S1)

Disease(S2)

6000

Headache

4000

Depression

2000

Depression

6000

Paranoia

5000

Catatonia

2000

Paranoia

6000

Catatonia

2000

Insomnia

FIGURE III

Proposed Technique Architecture

VI. CONCLUSION AND FUTURE WORK

This paper proposed to solve the problem that may occur in

our previous proposed model (α, β, k)-anonymity model in

[1]. Although the previous model has positive effect for

multiple sensitive attributes privacy and also it helps

anonymous data effectively to resist background knowledge

attack but one problem may occur. This problem may arise if

researcher tries to figure out the exact frequency number of

the rest sensitive attributes (except that is used as basis for

connecting numbers) and doesn’t consider all sensitive

attributes in the same tuple together as a set. In other words,

the frequency of any one of the rest sensitive attributes is

different from the existing frequency of the same attribute in

original table. Authors solve this problem by adding

frequency details in sensitive attributes table. By adding

frequency details, authors solve data utility problem and

make the model more efficient for both data privacy and data

utility. Frequency details affect research accuracy and help

researcher to find answers for some important questions,

especially for those imply the frequency number of any

sensitive attributes in original data table. Authors intends in

future solve the same problem using a hash function

technique.

REFERENCES

[1] Abou_el_ela Abdou Hussien, "An Effective Privacy Preserving

Model for Databases Using (α, β, k) - Anatomy Model and Lossy

Join", International Journal of Computer Networking, Wireless and

Mobile Communications, Vol. 3, Issue 1, pp.389-400, Mar, 2013.

[2] Mohammed J. Zaki, Limsoon Wong," Data Mining Techniques",

SPC/Lecture Notes Series: zaki-chap, August 9, 2003.

[3] Xingquan Zhu, Ian Davidson, "Knowledge Discovery and Data

Mining: Challenges and Realities", ISBN, Hershey, New York,

2007.

[4] Joseph, Zernik, "Data Mining as a Civic Duty – Online Public

Prisoners Registration Systems", International Journal on Social

Media: Monitoring, Measurement, Mining, Vol.No.1, pp. 84-96,

September, 2010.

International Journal of Computer Science and Information Security (IJCSIS),

Vol. 13, No. 11, November 2015

http://sites.google.com/site/ijcsis/

ISSN 1947-5500

[5] Zhao, Kaidi and Liu, Bing, Tirpark, Thomas M. and Weimin,

Xiao, "A Visual Data Mining Framework for Convenient

Identification of Useful Knowledge", ICDM'05 Proceedings of the

Fifth IEEE International Conference on Data Mining, Vol.No-1,

pp. 530-537, December, 2005.

[6] Venkatadri.M and Lokanatha C. Reddy, "A Comparative Study on

Decision Tree Classification Algorithm in Data Mining",

International Journal of Computer Applications in Engineering,

Technology and Sciences (IJCAETS), Vol.No. 2, pp. 24- 29, Sept,

2010.

[7] Sara Hajian, "Simultaneous Discrimination Prevention and Privacy

Protection in Data Publishing and Mining", A Dissertation

Submitted to the Department of Computer Engineering and

Mathematics of Universitat Roviraivirili, 28 Jun, 2013.

[8] Jagriti Singh, S.S.Sane," Discrimination Discovery and Prevention

in Data Mining", International Journal of Engineering Sciences &

Research Technology, Vol.No.3, June, 2014.

[9] Abou_el_ela Abdou Hussien, Nermin Hamza, Hesham A. Hefny,

"Attacks on Anonymization-Based Privacy-Preserving: A Survey

for Data Mining and Data Publishing", Journal of Information

Security jis, Vol.No. 4, pp.101-112, April, 2013.

[10] P. Samarati and L. Sweeney, "Protecting Privacy When Disclosing

Information: k-Anonymity and Its Enforcement through

Generalization and Suppression", Technical Report SRI-CSL-98-

04, 1998.

[11] Ke Wang, Benjamin C. M. Fung, "Anonymizing Sequential

Releases", KDD’06, Philadelphia, Pennsylvania, USA, August 20–

23, 2006.

[12] Nidhi Maheshwarkar, Kshitij Pathak, Vivekananda Chourey,

"Performance Issues of Various K-anonymity Strategies",

International Journal of Computer Technology and Electronics

Engineering (IJCTEE), ISSN, 2011.

[13] Pierangela Samarati, Latanya Sweeney, "Protecting Privacy when

Disclosing Information: K-Anonymity and its enforcement through

Generalization and Suppression", Special Issue of International

Journal of Computer Applications on Optimization and On-chip

Communication, Vol.No.10, Feb, 2012.

[14] Nidhi Maheshwarkar MIT, Ujjain Kshitij Pathak MIT, Ujjain

Narendra S. Choudhari IIT," K-anonymity Model for Multiple

Sensitive Attributes", Special Issue of International Journal of

Computer Applications on Optimization and On-chip

Communication, Vol.No.10. Feb.2012.

[15] Nagendra kumar.S, Aparna.R, "Sensitive Attributes based Privacy

Preserving in Data Mining using k-anonymity", International

Journal of Computer Applications, December, 2013.

[16] Abou_el_ela Abdo Hussein, Nagy Ramadan Darwish, Hesham A.

Hefny, "Multiple-Published Tables Privacy-Preserving Data

Mining: A Survey for Multiple-Published Tables Techniques",

(IJACSA) International Journal of Advanced Computer Science

and Applications, Vol.No. 6, 2015.

[17] A. Machanavajjhala, J. Gehrke, D. Kifer, and M.

Venkitasubramaniam,"L-diversity: Privacy beyond k-anonymity".

In Proc. 22nd Conf. Data Engg. (ICDE), pp. 24, 2006.

[18] Yan Zhaol, Jian Wangl, Yongcheng Luo, Jiajin Le, "(α, β, k)-

anonymity: An effective Privacy Preserving Model for Databases",

International Conference on Test and Measurement, 2009.

[19] Raymond Chi-Wing Wong1, Yubao Liu2, Jian Yin2, Zhilan

Huang2, AdaWai-Chee Fu1, and Jian Pei," (α, k)-anonymity Based

Privacy Preservation by Lossy join", Lecture Notes in Computer

Science, pp.733-744, 2007.

[20] X. Xiao and Y. Tao, "Anatomy: Simple and effective privacy

preservation", In VLDB, 2006.

[21] Xianmang He, Yanghua Xiao, Yujia Li, Qing Wang,Wei Wang, B

aile Shi,"Permutation Anonymization: Improving Anatomy for

Privacy Preservation in Data Publication", the series Lecture Notes

in Computer Science, Vol.No.7104, pp.111-123,2012.

Attribute based diversity model for privacy preservation

Conference Paper

Full-text available

May 2017

Privacy Preserving Enhancing Model for Multiple-sensitive Attributes

Conference Paper

Dec 2022

Multiple-Published Tables Privacy-Preserving Data Mining: A Survey for Multiple-Published Tables Techniques

Article

Full-text available

Jul 2015

With large growth in technology, reduced cost of storage media and networking enabled the organizations to collect very large volume of information from huge sources. Different data mining techniques are applied on such huge data to extract useful and relevant knowledge. The disclosure of sensitive data to unauthorized parties is a critical issue for organizations which could be most critical problem of data mining. So Privacy preserving data mining (PPDM) has become increasingly popular because it solves this problem and allows sharing of privacy sensitive data for analytical purposes. A lot of privacy techniques were developed based on the k-anonymity property. Because of a lot of shortcomings of the k-anonymity model, other privacy models were introduced. Most of these techniques release one table for research public after they applied on original tables. In this paper the researchers introduce techniques which publish more than one table for organizations preserving individual's privacy. One of this is (α, k) – anonymity using lossy-Join which releases two tables for publishing in such a way that the privacy protection for (α, k)-anonymity can be achieved with less distortion, and the other one is Anatomy technique which releases all the quasi-identifier and sensitive values directly in two separate tables, met l-diversity privacy requirements, without any modification in the original table.

Attacks on Anonymization-Based Privacy-Preserving: A Survey for Data Mining and Data Publishing

Article

Full-text available

Jan 2013

Data mining is the extraction of vast interesting patterns or knowledge from huge amount of data. The initial idea of privacy-preserving data mining PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. Privacy-preserving data mining considers the problem of running data mining algorithms on confidential data that is not supposed to be revealed even to the party running the algorithm. In contrast, privacy preserving data publishing (PPDP) may not necessarily be tied to a specific data mining task, and the data mining task may be unknown at the time of data publishing. PPDP studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks. Privacy-preserving for both data mining (PPDM) and data publishing (PPDP) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. One well studied approach is the k-anonymity model [1] which in turn led to other models such as confidence bounding, l-diversity, t-closeness, (α,k)-anonymity, etc. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. The aim of this paper is to present a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explain their effects on Data Privacy.

AN EFFECTIVE PRIVACY PRESERVING MODEL FOR DATABASES USING (α,β,k) -ANONYMITY MODEL AND LOSSY JOIN

Article

Full-text available

Mar 2013

Abou-El-Ela Abdou Hussien

Privacy is becoming an increasingly important issue in many data mining applications. This has triggered the development of many privacy-preserving data mining techniques. The proper protection of personal information is increasingly becoming an important issue in an age where misuse Personal Information and identity theft are widespread. At times there is a need however for management or statistical purposes based on personal information in aggregated form. The k-anonymization technique has been developed to de-associate sensitive attributes and anonymise the information needed to a point where the identity and associated details cannot be reconstructed. The protection of personal information has manifested itself in various forms, ranging from legislation, to policies such as P3P and also information systems such as Hippocratic database. Unfortunately, none of these provide support for statistical data research and analysis. The traditional k-anonymity technique proposes used to protect released data. Released data which is available for public used may contain sensitive and non-sensitive data. But K-anonymity model faces changes when set of sensitive attributes are present in the data set. In this paper, we use a novel privacy preserving model based on K-anonimty called (α,β,k)-anonymity for databases [1] can be used to protect data with multiple sensitive attributes. Then we propose Loosy-join K-anonimty model which can effectively protect privacy information of individual and resist background knowledge attack with multiple sensitive attributes.

DATA MINING TECHNIQUES

Article

Full-text available

Jan 1996
SIGMOD REC

Data mining is the semi-automatic discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in data. Traditional data analysis is assumption driven in the sense that a hypothesis is formed and validated against the data. Data mining, in contrast, is data driven in the sense that patterns are automatically ex-tracted from data. The goal of this tutorial is to provide an introduction to data mining techniques. The focus will be on methods appropriate for mining massive datasets using techniques from scalable and high perfor-mance computing. The techniques covered include association rules, se-quence mining, decision tree classification, and clustering. Some aspects of preprocessing and postprocessing are also covered. The problem of predicting contact maps for protein sequences is used as a detailed case study. The material presented here is compiled by LW based on the original tutorial slides of MJZ at the 2002 Post-Genome Knowledge Discovery Programme in Singapore.

A COMPARATIVE STUDY ON DECISION TREE CLASSIFICATION ALGORITHMS IN DATA MINING

Article

Apr 2018

Venkatadri Marriboyina

L-diversity: Privacy beyond k-Anonymity[J]

Article

Jan 2007

Sensitive Attributes based Privacy Preserving in Data Mining using k-anonymity

Article

Dec 2013

(α, β, k)-anonymity: An effective privacy preserving model for databases

Article

Dec 2009

Publishing the data with multiple sensitive attributes brings us greater challenge than publishing the data with single sensitive attribute in the area of privacy preserving. In this paper, we propose a novel privacy preserving model based on k-anonymity called (α, β, k)-anonymity for databases. (α, β, k)-anonymity can be used to protect data with multiple sensitive attributes in data publishing. Then, we set a hierarchy sensitive attribute rule to achieve (α, β, k)-anonymity model and develop the corresponding algorithm to anonymize the microdata by using generalization and hierarchy. We verify (α, β, k)-anonymity approach can effectively protect privacy information of individual and resist background knowledge attack in publishing the data with multiple sensitive attributes by specific example.

Simultaneous Discrimination Prevention and Privacy Protection in Data Publishing and Mining

Article

Jun 2013

Sara Hajian

Data mining is an increasingly important technology for extracting useful knowledge hidden in large collections of data. There are, however, negative social perceptions about data mining, among which potential privacy violation and potential discrimination. Automated data collection and data mining techniques such as classification have paved the way to making automated decisions, like loan granting/denial, insurance premium computation. If the training datasets are biased in what regards discriminatory attributes like gender, race, religion, discriminatory decisions may ensue. In the first part of this thesis, we tackle discrimination prevention in data mining and propose new techniques applicable for direct or indirect discrimination prevention individually or both at the same time. We discuss how to clean training datasets and outsourced datasets in such a way that direct and/or indirect discriminatory decision rules are converted to legitimate (non-discriminatory) classification rules. In the second part of this thesis, we argue that privacy and discrimination risks should be tackled together. We explore the relationship between privacy preserving data mining and discrimination prevention in data mining to design holistic approaches capable of addressing both threats simultaneously during the knowledge discovery process. As part of this effort, we have investigated for the first time the problem of discrimination and privacy aware frequent pattern discovery, i.e. the sanitization of the collection of patterns mined from a transaction database in such a way that neither privacy-violating nor discriminatory inferences can be inferred on the released patterns. Moreover, we investigate the problem of discrimination and privacy aware data publishing, i.e. transforming the data, instead of patterns, in order to simultaneously fulfill privacy preservation and discrimination prevention.

Knowledge Discovery and Data Mining: Challenges and Realities

Article

Jan 2007

Knowledge discovery and data mining (KDD) is dedicated to exploring meaningful information from a large volume of data. Knowledge Discovery and Data Mining: Challenges and Realities is the most comprehensive reference publication for researchers and real-world data mining practitioners to advance knowledge discovery from low-quality data. This Premier Reference Source presents in-depth experiences and methodologies, providing theoretical and empirical guidance to users who have suffered from underlying, low-quality data. International experts in the field of data mining have contributed all-inclusive chapters focusing on interdisciplinary collaborations among data quality, data processing, data mining, data privacy, and data sharing.

Enhanced Privacy Preserving Model for Data Using (α, β, k)-Anonymity Model and Lossy join

Abstract and Figures

Recommended publications

Simple data transformation method for privacy preserving data re-publication

K-Anonymity Based on Sensitive Tuples

Enhanced Privacy Preserving Model for Data Using (α, β, k)-Anonymity Model and Lossy join

AN EFFECTIVE PRIVACY PRESERVING MODEL FOR DATABASES USING (α,β,k) -ANONYMITY MODEL AND LOSSY JOIN

Multiple-Published Tables Privacy-Preserving Data Mining: A Survey for Multiple-Published Tables Tec...

( α , k )-anonymity Based Privacy Preservation by Lossy Join