Fig 1 - uploaded by Gábor György Gulyás
Content may be subject to copyright.
Example providing insights on de-anonymization and identity separation (left: auxiliary network G src ; right: sanitized network G tar ). 

Example providing insights on de-anonymization and identity separation (left: auxiliary network G src ; right: sanitized network G tar ). 

Source publication
Conference Paper
Full-text available
Social networks allow their users to mark their profile at-tributes, relationships as private in order to guarantee privacy, although private information get occasionally published within sanitized datasets offered to third parties, such as business partners. Today, powerful de-anonymization attacks exist that enable the finding of corresponding no...

Contexts in source publication

Context 1
... instance, an attacker may obtain datasets as depicted on Fig. 1, wishing to know an otherwise inaccessible private attribute: who prefers tea or coffee (dashed or thick bordered nodes). She initializes the seed set by re-identifying (or mapping) v Alice ↔ v 7 and v Bob ↔ v 3 as they have globally the highest degree in both networks (global re-identification phase). Next, she looks for nodes with ...
Context 2
... present effect of identity separation by the example of Fred on Fig. 1, who created two unlinkable profiles, v 8 for pretending being a coffee fan towards his closer friends (Alice, Ed, Greg), but also created v 12 for maintaining relationships with tea lovers (Harry, Jennie). By applying the attack algorithm, it can be seen that the hidden drink preference of Fred will not be discovered by third ...

Similar publications

Article
Full-text available
The paradigm of communication “anywhere, any way and at any time” of mobile and universal computing extends to “anything, any person and any service” with the Internet of Things (IoT). There are more and more users adopting these technologies that generate significant amounts of information. However, there are individuals and institutions (public a...

Citations

... This can also be applied to social networks to segregate information with different groups of contacts [20]. In our previous works, we have proposed models for applying identity separation to social networks [14,20] and also provided the analysis of identity separation against re-identification at a model level [16,18]. In these evaluations, users adopted identity solely on their own, and in some of the settings they cooperated to stop the attack. ...
... We build on our previous work, as we use the identity separation models from [14]. Our aim is to provide guarantees compared to previous models of identity separation [16,18]. We analyze identity separation from an individual point of view, where the goal is to minimize possible private data leakage. ...
... Previously, we have analytically showed that identity separation is an effective tool against clique-based seeding mechanisms [14]. We furthermore analyzed the protective strength of identity separation against the propagation phase of Nar09 with simulation on datasets obtained from three different social networks [16,18]. We have shown that it is possible to stop the re-identification attacks just by having 3-50% users adopting identity separation (the number of participants depends if users are cooperating or not), and it is possible to effectively hide information even for a few nodes. ...
Article
Full-text available
Connections between users of social networking services pose a significant privacy threat. Recently, several social network de-anonymization attacks have been proposed that can efficiently re-identify users at large scale, solely considering the graph structure. In this paper, we consider these privacy threats and analyze de-anonymization attacks at the model level against a user-controlled privacy-enhancing technique called identity separation. The latter allows creating seemingly unrelated identities in parallel, even without the consent of the service provider or other users. It has been shown that identity separation can be used efficiently against re-identification attacks if user cooperate with each other. However, while participation would be crucial, this cannot be granted in a real-life scenario. Therefore, we introduce the y-identity model, in which the user creates multiple separated identities and assigns the sensitive attribute to one of them according to a given strategy. For this, we propose a strategy to be used in real-life situations and formally prove that there is a higher bound for the expected privacy loss which is sufficiently low.
... Unfortunately, it does not implement several important functions that are necessary for large-scale experimentation, such as synthetic dataset perturbation (e.g., as in [19] ), seed generation algorithms, and detailed evaluation of results (e.g., saving every minute detail while attacks are executed). SALab had these functions and attacks implemented already, as previous versions of it has been used in multiple works: for the analysis of the importance of seeding [9], for measuring anonymity [7,10], for evaluating the Grasshopper attack [5], and for analyzing numerous settings for adopting identity separation for protecting privacy [6, 8, 10] . Therefore we decided to choose SALab as our main tool. ...
Article
Releasing connection data from social networking services can pose a significant threat to user privacy. In our work, we consider structural social network de-anonymization attacks, which are used when a malicious party uses connections in a public or other identified network to re-identify users in an anonymized social network release that he obtained previously. In this paper we design and evaluate a novel social de-anonymization attack. In particular, we argue that the similarity function used to re-identify nodes is a key component of such attacks, and we design a novel measure tailored for social networks. We incorporate this measure in an attack called Bumblebee. We evaluate Bumblebee in depth, and show that it significantly outperforms the state-of-the-art, for example it has higher re-identification rates with high precision, robustness against noise, and also has better error control.
... Previously, we have analytically showed that identity separation is an effective tool against clique based seeding mechanisms [13]. More recently, we also analyzed the protective strength of non-cooperative identity separation against the propagation phase of Nar09 [15] with simulation on datasets obtained from three different social networks. We have shown that while almost half of the users are required to repel the attack (and retain network privacy ); it is possible to effectively hide information from the attacker even for a few nodes if the proper settings are applied. ...
... Recently we have shown that seeding parameters are an important aspect of the de-anonymization procedure, as they have a significant effect on the overall results [16]. Thus, it should be detailed both for comparing new attack schemes (e.g., [23] ) and for evaluating protection mechanisms (e.g., [7, 15]). Therefore we analyzed our findings from this aspect, too. ...
... However, some data on ego networks, which is a similar functionality to identity separation, is available from Google+, Twitter and Facebook [3].Table 1: Recall rates were proportional to the overlap between G src and G tar : the less perturbation is used (resulting higher overlaps) the higher recall rates are. data, we found that the number of circles has a power-law distribution, and duplication of connections across contact groups is not widely used [15]. As we cannot draw strong conclusions from these observations regarding identity separation (as the data lacks patterns on the use of hidden connections), we use the probability based models we previously introduced in [13] for deriving test data from real-world datasets featuring identity separation. ...
Article
Full-text available
Due to the nature of the data that is accumulated in social networking services, there are a great variety of data-driven uses. However, private information occasionally gets published within sanitized datasets offered to third parties. In this paper we consider a strong class of de-anonymization attacks that can re-identify these datasets using structural information crawled from other networks. We provide the model level analysis of a technique called identity separation that could be used for hiding information even from these attacks. We show that in case of non-collaborating users ca. 50% of them need to adopt the technique in order to tackle re-identification over the network. We additionally highlight several settings of the technique that allows preserving privacy on the personal level. In the second part of our experiments we evaluate a measure of anonymity, and show that if users with low anonymity values apply identity separation, the minimum adoption rate for repelling the attack drops down to 3-15%. Additionally, we show that it is necessary for top degree nodes to participate.
... Previously, we have analytically showed that identity separation is an effective tool against clique based seeding mechanisms [11]. In subsequent works, we analyzed the protective strength of identity separation against the propagation phase of Nar09 [13,15] with simulation on datasets obtained from three different social networks. In [13] we analyzed the noncooperative setting, and we have shown that while almost half of the users are required to repel the attack (and retain network privacy), it is possible to effectively hide information from the attacker even for a few nodes if the proper settings are applied. ...
... In subsequent works, we analyzed the protective strength of identity separation against the propagation phase of Nar09 [13,15] with simulation on datasets obtained from three different social networks. In [13] we analyzed the noncooperative setting, and we have shown that while almost half of the users are required to repel the attack (and retain network privacy), it is possible to effectively hide information from the attacker even for a few nodes if the proper settings are applied. In [15] we analyzed the cooperative setting organized accordingly to the importance of nodes, based on their anonymity values. ...
... Finally, we have shown that seeding parameters are an important aspect of the de-anonymization procedure, as they have a significant effect on the overall results [14]. Thus, it should be detailed both for comparing new attack schemes (e.g., [23]) and for evaluating protection mechanisms (e.g., [5,13]). Therefore we analyzed our findings regarding this finding, too. ...
Article
Full-text available
Social networks have an important and possibly key role in our society today. In addition to the benefits, serious privacy concerns also emerge: there are algorithms called de-anonymization attacks that are capable of re-identifying large fractions of anonymously published networks. A strong class of these attacks solely use the network structure to achieve their goals. In this paper we propose a novel structural de-anonymization attack called Grasshopper. By measurements we compare Grasshopper to the state-of-the-art algorithm, and highlight its enhanced capabilities, such as having negligible error rates and accessing yield levels that was not possible before: in cases when there is greater noise in the background knowledge. We furthermore evaluate an anonymity measure for the Grasshopper algorithm which enables the approximate ranking of nodes according to their re-identification rates. Finally, we characterize the robustness of Grasshopper in tackling identity separation, a privacy-enhancing technique that facilitate hiding of structural information.
... Although multiple adaptations exists of the Nar09 algorithm [2]- [7], and other works use the attack for simulation evaluation of privacy-enhancing features [8]- [10], an important aspect of the attacker model is often neglected: how changing the seeding method influence the performance of the propagation. In our work we aim filling this gap by analyzing multiple methods on different networks, and also including related works discussing this topic [1], [5], [8]. ...
... For instance, Narayanan and Shmatikov describe how they used 4-clique seeding consisting of high degree nodes [1], but in another work [3], it is not detailed in how seeds were selected during the evaluation of the propagation phase (i.e., the nodes that the injected subgraph is connected to). Similarly, protection mechanisms as [9], [10] should be evaluated against attackers capable of using multiple seeding methods. ...
... The original paper used high-degree nodes for seeding that formed 4-cliques [1] (in their main experiment they used seed nodes with at least a degree of 80), while another work used nodes from 4-cliques regardless of degree [8] for smaller networks. Several other seeding methods appeared in the literature, as matching top nodes [2], [9], (presumably) sampling random nodes in [3], and seeds selected randomly from top 25% high degree nodes [10]. ...
Conference Paper
Full-text available
Social networks allow their users to make their profiles and relationships private. However, in recent years several powerful de-anonymization attacks have been proposed that are able to map corresponding nodes within two seemingly unrelated datasets solely by considering structural information (e.g., crawls of public social networks and datasets published after sanitization). These algorithms consist of two parts: initial selection of seed nodes and then a propagation phase. In related papers, several seeding procedures are proposed, although detailed comparison is often left unexplored, i.e., how one method differs from the others with respect to the overall outcome of the algorithm. In this paper, beside discussing the existing analysis of seeding methods, we experimentally analyze how different seed selection algorithms perform compared to each other, and we highlight significant differences emerging even in the same or in structurally divergent networks.
... In addition to location data, the knowledge of the social network, as shown by the works of Srivatsa and Hicks and Sharad and Danezis, can be used as side information to help in performing the de-anonymization. However, it might be possible to mitigate this risk of re-idenfication by sanitizing the social graph before releasing it [GI13]. The use of geolocation application is becoming an everyday habit as smart cities are more and more connected to mainstream cell phones with GPS capabilities. ...
Article
Full-text available
In recent years, we have observed the development of connected and nomad devices such as smartphones, tablets or even laptops allowing individuals to use location-based services (LBSs), which personalize the service they offer according to the positions of users, on a daily basis. Nonetheless, LBSs raise serious privacy issues, which are often not perceived by the end users. In this thesis, we are interested in the understanding of the privacy risks related to the dissemination and collection of location data. To address this issue, we developed inference attacks such as the extraction of points of interest (POI) and their semantics, the prediction of the next location as well as the de-anonymization of mobility traces, based on a mobility model that we have coined as mobility Markov chain. Afterwards, we proposed a classification of inference attacks in the context of location data based on the objectives of the adversary. In addition, we evaluated the effectiveness of some sanitization measures in limiting the efficiency of inference attacks. Finally, we have developed a generic platform called GEPETO (for GEoPrivacy Enhancing Toolkit) that can be used to test the developed inference attacks.
Article
Full-text available
Recently, a huge amount of social networks have been made publicly available. In parallel, several definitions and methods have been proposed to protect users’ privacy when publicly releasing these data. Some of them were picked out from relational dataset anonymization techniques, which are riper than network anonymization techniques. In this paper we summarize privacy-preserving techniques, focusing on graph-modification methods which alter graph’s structure and release the entire anonymous network. These methods allow researchers and third-parties to apply all graph-mining processes on anonymous data, from local to global knowledge extraction.