Fig 1 - uploaded by Nicola Zannone
Content may be subject to copyright.
A Two-dimensional Support Vector Machine [14].

A Two-dimensional Support Vector Machine [14].

Context in source publication

Context 1
... framework. SVM aims to find a hyperplane in an m-dimensional space (with m the number of attributes) that distinctly classifies the data points. To separate two classes of data points, there are many possible hyperplanes, where the goal is to find the one that has the maximum margin, i.e. the maximum distance between data points of both classes. Fig. 1 shows an SVM for a linearly separable binary classification problem. The w and b that solve the following optimization problem determine the ...

Similar publications

Chapter
Full-text available
Die Privatisierung von Bildung reicht über das Sponsoring von Schulfesten, die Produktion von Unterrichtsmaterialien durch private Content-Anbieter und den wachsenden Markt kommerzieller Nachhilfeanbieter hinaus. So kommen auch in verschiedenen bildungspolitischen Debatten Privatisierungstendenzen zum Ausdruck – seien es die Diskussionen um die „Sc...

Citations

... Naïve Bayes classifier over encrypted data: To securely construct a Naïve Bayes classifier over secret shares of data, these probabilities should be computed for all possible attribute-values and class labels [53]. This requires the secure computation of the following counts on the whole distributed data: (the total number of records); (the total number of records labeled class in with the value of attribute ); (the total number of records with class ). ...
... and ( = | ) = end SVM over encrypted data: To construct an SVM classifier over secret shares of data, it requires to compute (x i , x j ) for all , , where x i , x j , , and are given in protected format (secret shares), and and should optimally be computed solving the optimization problem presented in Eq. 15 [53]. Also, we assume that the kernel function is linear, i.e., (x i , x j ) = x i · x j . ...
... The Naïve Bayes and SVM classifiers have been trained in encrypted-based setting using 32-bit Arithmetic sharing in ABY framework (i.e., = 32). The experiments are performed on a single machine running Ubuntu 18.04 LTS with a 64-bit microprocessor and 16 GB of RAM, with Intel Core i7-4770, 3.40 GHz x 8 [53]. The total computation and communication costs of training Naïve Bayes and SVM classifiers on Adult dataset are shown in Table 5 (for four data providers). ...
Chapter
Cyber-physical systems (CPS) are smart computer systems that control or monitor machines through computer-based algorithms, which are vulnerable to both cyber and physical threats. Similar to the growing number of applications, CPS also employ classification algorithms as a tool for data analysis and continuous monitoring of the system. While the utility of data is significantly important in building an accurate and efficient classifier, a free access to original (raw) format of data is a crucial challenge due to privacy constraints. Therefore, it is tremendously important to train classifiers in a private setting in which the privacy of individuals is protected, while data remains still practically useful for building the model. In this chapter, we investigate the application of three privacy preserving models, namely anonymization, Differential Privacy (DP), and cryptography, to privatize data and evaluate the performance of two popular classifiers, Naïve Bayes and Support Vector Machine (SVM) over the protected data. Their performances are compared in terms of accuracy, training construction costs on the same data and in the same private environment. Finally, comprehensive findings on constructing the privacy preserved classifiers are outlined. The attack models against the training data and against the private classifier models are also discussed.
... Few studies in the literature compare the performance of different classifiers in a private setting. The performance of different classifiers that are trained over private inputs using secure two-party computation [19], anonymization techniques (e.g., k-anonymity) [4], and differential privacy [14], have already been explored. In all of these researches, the impact of the dataset, the inherent properties of classifiers, and privacy requirement on the performance of private classifiers have been investigated. ...
... LDP-based Decision Tree Classifier input : D: dataset {(X1,c1),...,(Xn,cn)}, A: set of feature domains A={A1,...,A k }, d: depth of the tree, IF : information gain algorithm, parent: start node, l: LDP mechanism, ϵ: privacy budget output : ← label of max(countj) ; 11 lf ← leaf with value x * i and label lab; 12 attach lf to parent; /* Create node for every value of feature and rerun algorithm from that node */ 13 else 14 foreach x * i ∈ Aj do 15 no ← node with value x * i ; 16 attach no to parent; /* The semi-trusted party returns Dt */ 17Dt ← all data from D where Aj has x * i ;18 At ← A without Aj;19 LDP-based Decision Tree Classifier(Dt, At, d−1 , IF , no, l, ϵ); ...
Chapter
Full-text available
In recent years, Local differential privacy (LDP), as a strong privacy preserving methodology, has been widely deployed in real world applications. It allows the users to perturb their data locally on their own devices before being sent out for analysis. In particular, LDP serves as an effective solution for the construction of privacy-preserving classifiers. While several approaches in the literature have been proposed to build classifiers over distributed locally differential private data, an understanding of the difference in the performance of these LDP-based classifiers is currently missing. In this study, we investigate the impact of using LDP on four well-known classifiers, i.e., Naïve Bayes, Decision Tree, Random Forest, and Logistic Regression classifiers. We evaluate the impact of dataset’s properties, LDP mechanisms, privacy budget, and classifiers’ structure on LDP-based classifiers’ performance.
... While some work in the literature compares the impact of privacy in the context of classifier training, e.g., over encrypted data [24] and under differential privacy [14], to the best of our knowledge, no prior work has provided a comparison of the performance achieved by different classifiers when trained on different datasets before and after being anonymized. ...
Chapter
Full-text available
The problem of protecting datasets from the disclosure of confidential information, while published data remains useful for analysis, has recently gained momentum. To solve this problem, anonymization techniques such as k-anonymity, \(\ell \)-diversity, and t-closeness have been used to generate anonymized datasets for training classifiers. While these techniques provide an effective means to generate anonymized datasets, an understanding of how their application affects the performance of classifiers is currently missing. This knowledge enables the data owner and analyst to select the most appropriate classification algorithm and training parameters in order to guarantee high privacy requirements while minimizing the loss of accuracy. In this study, we perform extensive experiments to verify how the classifiers performance changes when trained on an anonymized dataset compared to the original one, and evaluate the impact of classification algorithms, datasets properties, and anonymization parameters on classifiers’ performance.
... Specifically, we showed how Naïve Bayes, SVM, and Decision Tree classifiers can be constructed in an ε-DP setting and compared their performance. While some work in the literature compares the impact of privacy in the context of classifier learning, e.g., the costs of training different classifiers using Homomorphic Encryption (Sheikhalishahi and Zannone, 2020), to the best of our knowledge no prior work has focused on the comparison of classifiers' performance in a differential privacy setting. ...
... Specifically, we showed how Naïve Bayes, SVM, and Decision Tree classifiers can be constructed in an ε-DP setting and compared their performance. While some work in the literature compares the impact of privacy in the context of classifier learning, e.g., the costs of training different classifiers using Homomorphic Encryption (Sheikhalishahi and Zannone, 2020), to the best of our knowledge no prior work has focused on the comparison of classifiers' performance in a differential privacy setting. ...
Article
Feature selection has become significantly important for data analysis. It selects the most informative features describing the data to filter out the noise, complexity, and over-fitting caused by less relevant features. Accordingly, feature selection improves the predictors’ accuracy, enables them to be trained faster and more cost-effectively, and provides a better understanding of the underlying data. While plenty of practical solutions have been proposed in the literature to identify the most discriminating features describing a dataset, an understanding of feature selection over privacy-sensitive data in the absence of a trusted party is still missing. The design of such a framework is specifically important in our modern society, where each individual through accessing the Internet can play simultaneously the role of a data provider and a data-analysis beneficiary. In this study, we propose a novel feature selection framework based on Local Differential Privacy (LDP), named LDP-FS, which estimates the importance of features over securely protected data while protects the confidentiality of each individual data before leaving the user’s device. The performance of LDP-FS in terms of scoring and ordering the features is assessed by investigating the impact of datasets properties, privacy mechanism, privacy levels, and feature selection techniques on this framework. The accuracy of classifiers trained on the selected subset of features by LDP-FS is also presented. Our experimental results demonstrate the effectiveness and efficiency of the proposed framework.
Article
Full-text available
The application of machine learning techniques to large and distributed data archives might result in the disclosure of sensitive information about the data subjects. Data often contain sensitive identifiable information, and even if these are protected, the excessive processing capabilities of current machine learning techniques might facilitate the identification of individuals, raising privacy concerns. To this end, we propose a decision-support framework for data anonymization, which relies on a novel approach that exploits data correlations, expressed in terms of relaxed functional dependencies (rfds) to identify data anonymization strategies providing suitable trade-offs between privacy and data utility. Moreover, we investigate how to generate anonymization strategies that leverage multiple data correlations simultaneously to increase the utility of anonymized datasets. In addition, our framework provides support in the selection of the anonymization strategy to apply by enabling an understanding of the trade-offs between privacy and data utility offered by the obtained strategies. Experiments on real-life datasets show that our approach achieves promising results in terms of data utility while guaranteeing the desired privacy level, and it allows data owners to select anonymization strategies balancing their privacy and data utility requirements.