ArticlePDF Available

Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid

September 1997

September 1997

Authors:

Kohavi

Naive-Bayes induction algorithms were previously shown to be surprisingly accurate on many classification tasks even when the conditional independence assumption on which they are based is violated. However, most studies were done on small databases. We show that in some larger databases, the accuracy of Naive-Bayes does not scale up as well as decision trees. We then propose a new algorithm, NBTree, which induces a hybrid of decision-tree classifiers and Naive-Bayes classifiers: the decision-tree nodes contain univariate splits as regular decision-trees, but the leaves contain Naive-Bayesian classifiers. The approach retains the interpretability of Naive-Bayes and decision trees, while resulting in classifiers that frequently outperform both constituents, especially in the larger databases tested.

Visualization of a Naive-Bayes classiier for the iris dataset.

…

Figures - uploaded by Ron Kohavi

Content may be subject to copyright.

Content uploaded by Ron Kohavi

Content may be subject to copyright.

500 1000 1500 2000 2500

accuracy

instances

DNA

Naive-Bayes

C4.5

1000 2000 3000 4000

accuracy

instances

waveform-40

Naive-Bayes

C4.5

0 500 1000 1500 2000 2500

accuracy

instances

led24

Naive-Bayes

C4.5

99.1

99.2

99.3

99.4

99.5

99.6

99.7

99.8

99.9

100

15,000 30,000 45,000 60,000

accuracy

instances

shuttle

Naive-Bayes

C4.5

5,000 10,000 15,000 20,000

accuracy

instances

letter

Naive-Bayes

C4.5

82.5

83.5

84.5

85.5

86.5

15,000 30,000 45,000

accuracy

instances

adult

Naive-Bayes

C4.5

100

0 500 1000 1500 2000 2500

accuracy

instances

chess

Naive-Bayes

C4.5

100

2,000 4,000 6,000 8,000

accuracy

instances

mushroom

Naive-Bayes

C4.5

1000 2000 3000 4000 5000 6000

accuracy

instances

satimage

Naive-Bayes

C4.5

NBTree - C4.5

NBTree - NB

tic-tac-toe

chess

letter

vehicle

vote

monk1

segment

satimage

flare

iris

led24

mushroom

vote1

adult

shuttle

soybean-large

DNA

ionosphere

breast (L)

crx

breast (W)

german

pima

heart

glass

cleve

waveform-40

glass2

primary-tumor

Accuracy difference

-10.00

0.00

10.00

20.00

30.00

NBTree/ C4.5

NBTree/ NB

tic-tac-toe

chess

letter

vehicle

vote

monk1

segment

satimage

flare

iris

led24

mushroom

vote1

adult

shuttle

soybean-large

DNA

ionosphere

breast (L)

crx

breast (w)

german

pima

heart

glass

cleve

waveform-40

glass2

primary-tumor

Error Ratio

0.00

0.50

1.00

1.50

Fair Data Generation via Score-based Diffusion Model

Preprint

Full-text available

Jun 2024

The fairness of AI decision-making has garnered increasing attention, leading to the proposal of numerous fairness algorithms. In this paper, we aim not to address this issue by directly introducing fair learning algorithms, but rather by generating entirely new, fair synthetic data from biased datasets for use in any downstream tasks. Additionally, the distribution of test data may differ from that of the training set, potentially impacting the performance of the generated synthetic data in downstream tasks. To address these two challenges, we propose a diffusion model-based framework, FADM: Fairness-Aware Diffusion with Meta-training. FADM introduces two types of gradient induction during the sampling phase of the diffusion model: one to ensure that the generated samples belong to the desired target categories, and another to make the sensitive attributes of the generated samples difficult to classify into any specific sensitive attribute category. To overcome data distribution shifts in the test environment, we train the diffusion model and the two classifiers used for induction within a meta-learning framework. Compared to other baselines, FADM allows for flexible control over the categories of the generated samples and exhibits superior generalization capability. Experiments on real datasets demonstrate that FADM achieves better accuracy and optimal fairness in downstream tasks.

Knowledge-routed Automatic Diagnosis with Heterogeneous Patient-oriented Graph

Article

Full-text available

Jan 2024

Automated diagnosis, as a temporary medical supplement, has gained significant attention in research in recent years. Existing methods employ sequence generation approaches to inquire about symptoms and diagnose diseases. However, these methods ignore the fact that: 1) doctors utilize their past experience and similar cases to aid in diagnosis in real-world scenarios; 2) doctors inquire about key symptoms that serve as vital diagnostic evidence within limited conversations. To address these issues, we propose an end-to-end model KDPoG. Firstly, in addition to use the symptom and attribute embedding, we propose patient-oriented graph enhanced representation learning, which is built by a patient-oriented graph and learned with heterogeneous graph convolution networks. Furthermore, based on the encoder built with gated attention units, we propose knowledge-guided attention mechanism learning, which incorporates conditional probabilities of co-occurrence between symptom pairs. Finally, we utilize two linear layers as the classification module to achieve symptom probing and disease diagnosis. We conduct extensive experiments on four public datasets, which demonstrate that our proposed model outperforms the state-of-the-art methods. We achieve an average absolute improvement of over 2% in disease diagnosis accuracy. Particularly, on the Muzhi-10 dataset, we observe an absolute improvement of over 14.7% in symptom recall rate.

Embedded feature selection for neural networks via learnable drop layer

Article

Jun 2024

Feature selection is a widely studied technique whose goal is to reduce the dimensionality of the problem by removing irrelevant features. It has multiple benefits, such as improved efficacy, efficiency and interpretability of almost any type of machine learning model. Feature selection techniques may be divided into three main categories, depending on the process used to remove the features known as Filter, Wrapper and Embedded. Embedded methods are usually the preferred feature selection method that efficiently obtains a selection of the most relevant features of the model. However, not all models support an embedded feature selection that forces the use of a different method, reducing the efficiency and reliability of the selection. Neural networks are an example of a model that does not support embedded feature selection. As neural networks have shown to provide remarkable results in multiple scenarios such as classification and regression, sometimes in an ensemble with a model that includes an embedded feature selection, we attempt to embed a feature selection process with a general-purpose methodology. In this work, we propose a novel general-purpose layer for neural networks that removes the influence of irrelevant features. The Feature-Aware Drop Layer is included at the top of the neural network and trained during the backpropagation process without any additional parameters. Our methodology is tested with 17 datasets for classification and regression tasks, including data from different fields such as Health, Economic and Environment, among others. The results show remarkable improvements compared to three different feature selection approaches, with reliable, efficient and effective results.

Efficient k-means with Individual Fairness via Exponential Tilting

Preprint

Full-text available

Jun 2024

In location-based resource allocation scenarios, the distances between each individual and the facility are desired to be approximately equal, thereby ensuring fairness. Individually fair clustering is often employed to achieve the principle of treating all points equally, which can be applied in these scenarios. This paper proposes a novel algorithm, tilted k-means (TKM), aiming to achieve individual fairness in clustering. We integrate the exponential tilting into the sum of squared errors (SSE) to formulate a novel objective function called tilted SSE. We demonstrate that the tilted SSE can generalize to SSE and employ the coordinate descent and first-order gradient method for optimization. We propose a novel fairness metric, the variance of the distances within each cluster, which can alleviate the Matthew Effect typically caused by existing fairness metrics. Our theoretical analysis demonstrates that the well-known k-means++ incurs a multiplicative error of O(k log k), and we establish the convergence of TKM under mild conditions. In terms of fairness, we prove that the variance generated by TKM decreases with a scaled hyperparameter. In terms of efficiency, we demonstrate the time complexity is linear with the dataset size. Our experiments demonstrate that TKM outperforms state-of-the-art methods in effectiveness, fairness, and efficiency.

FASTER-CE: Fast, Sparse, Transparent, and Robust Counterfactual Explanations

Chapter

Jun 2024

Operationalizing the Search for Less Discriminatory Alternatives in Fair Lending

Conference Paper

Jun 2024

Interpretable White-Box Fairness Testing Through Biased Neuron Identification

Chapter

Jun 2024

In recent years, deep neural networks (DNNs) have been widely used in a wide range of applications. However, there is a societal concern about the ability of DNNs to make sound and equitable decisions, particularly when they are used in sensitive areas where valuable resources are allocated, such as education, loan, and employment. Before reliable deployment of DNNs in such a sensitive domain, it is essential to do a fair test, i.e., generating as many instances as possible to uncover fairness violations. However, the current testing methods are still restricted in the aspects of interpretability, performance, and generalizability. To overcome the challenges, we propose a new DNN fairness testing framework that differs from previous work in several key aspects: (1) interpretable—it quantitatively interprets DNNs’ fairness violations for the biased decision; (2) effective—it uses the interpretation results to guide the generation of more diverse instances in less time; (3) generic—it can handle both structured and unstructured data. A large number of DNNs are used to evaluate the performance of our method. For example, on a structured dataset, it is also possible to exploit the instances of our method to increase the fairness of the biased DNNs.

Certifiable Prioritization for Deep Neural Networks via Movement Cost in Feature Space

Chapter

Jun 2024

Although deep neural networks (DNNs) have shown superior performance in different software systems, they also display malfunctioning and can even lead to irreversible catastrophes. Hence, it is significant to detect the misbehavior of DNN-based software and enhance the quality of DNNs. Test input prioritization is a highly effective approach to ensure the quality of DNNs. This method involves prioritizing test inputs in such a way that inputs that are more likely to reveal bugs or issues are identified early on, even with limited time and manual labeling efforts. Nevertheless, current prioritization methods still have limitations in three aspects: certifiability, effectiveness, and generalizability. To overcome the challenges, we propose a test input prioritization technique designed based on a movement cost perspective of test inputs in DNNs’ feature space. Our method differs from previous works in three key aspects: (1) certifiable—it provides a formal robustness guarantee for the movement cost; (2) effective—it leverages formally guaranteed movement costs to identify malicious bug-revealing inputs; and (3) generic—it can be applied to various tasks, data, models, and scenarios. Extensive evaluations across two tasks (i.e., classification and regression), six data forms, four model structures, and two scenarios (i.e., white box and black box) demonstrate our method’s superior performance. For instance, it significantly improves 53.97% prioritization effectiveness on average compared with baselines. Its robustness and generalizability are 1.41\(\sim \)2.00 times and 1.33\(\sim \)3.39 times that of baselines on average, respectively.

Defense Against Free-Rider Attack from the Weight Evolving Frequency

Chapter

Jun 2024

Federated learning (FL) with multiple clients collaborating to train a federated model without exchanging their individual data is a method of distributed machine learning. Although federated learning has gained an unprecedented success in data privacy preservation, its frailty of vulnerability to “free-rider” attacks attracts increasing attention. A number of defenses against free-rider attacks have been proposed for FL. Nevertheless, these methods may not protect against highly masquerading hitchhikers. Furthermore, when more than 20% of the clients are “hitchhikers”, the effectiveness of their defense may drop dramatically. To tackle these challenges, we reconceptualize the defense problem from a new perspective, i.e., the frequency of model weight evolution. Based on our experience, a new insight is gained that the frequency of model weight evolution is significantly different for free-riders and benign clients during the training process of FL. Motivated by this insight, a novel defense method based on the frequency of model weight evolution is proposed. In particular, a frequency of weight changes during the local training process is first collected. In the case of each client, it takes the WEF-Matrix of the local model and its weight of the model for each iteration and uploads it to the server. The server then separates “free-riders” from virtuous clients based on the difference in the WEF-Matrix. Finally, the broker uses a personalized method to offer different global models to the appropriate clients, thus keeping hitchhikers from obtaining high-value models. The combined experiments on five datasets and five models show that our method defends better than the state-of-the-art baseline and can identify hitchhikers at an early stage. The hitchhikers are identified at an early stage of training. Furthermore, we also verify the effectiveness of our method to adaptive attacks and visualize the WEF-Matrix during training to explain its effectiveness.

Contributions Estimation in Federated Learning: A Comprehensive Experimental Evaluation

Article

May 2024

Federated Learning (FL) provides a privacy-preserving and decentralized approach to collaborative machine learning for multiple FL clients. The contribution estimation mechanism in FL is extensively studied within the database community, which aims to compute fair and reasonable contribution scores as incentives to motivate FL clients. However, designing such methods involves challenges in three aspects: effectiveness, robustness, and efficiency. Firstly, contribution estimation methods should utilize the data utility information of various client coalitions rather than that of individual clients to ensure effectiveness. Secondly, we should beware of adverse clients who may exploit tactics like data replication or label flipping. Thirdly, estimating contribution in FL can be time-consuming due to enumerating various client coalitions. Despite numerous proposed methods to address these challenges, each possesses distinct advantages and limitations based on specific settings. However, existing methods have yet to be thoroughly evaluated and compared in the same experimental framework. Therefore, a unified and comprehensive evaluation framework is necessary to compare these methods under the same experimental settings. This paper conducts an extensive survey of contribution estimation methods in FL and introduces a comprehensive framework to evaluate their effectiveness, robustness, and efficiency. Through empirical results, we present extensive observations, valuable discoveries, and an adaptable testing framework that can facilitate future research in designing and evaluating contribution estimation methods in FL.

Wrappers for Performance Enhancement and Oblivious Decision Graphs

Article

Full-text available

Jan 1995

Ron Kohavi

Submitted to the Department of Computer Science. Copyright by the author. Thesis (Ph. D.)--Stanford University, 1995.

Competing for the future

Article

Jan 1994
HARVARD BUS REV

From data mining to knowledge discovery: an overview

Conference Paper

Feb 1996

The Estimation of Probabilities: An Essay On Modern Bayesian Methods

Book

Dec 1965

Irving J. Good

UCI Repository of machine learning databases [Machine-readable data repository

Article

Jan 1995

The process of knowledge discovery in databases

Article

Jan 1996

An abstract is not available.

Almost surely consistent nonparametric regression from recursive partitioning schemes

Article

Oct 1984

Presented here are results on almost sure convergence of estimators of regression functions subject to certain moment restrictions. Two somewhat different notions of almost sure convergence are studied: unconditional and conditional given a training sample. The estimators are local means derived from certain recursive partitioning schemes.

Perceptron Trees: A Case Study In Hybrid Concept Representations.

Conference Paper

Jan 1988

Paul E. Utgoff

The paper presents a case study in examining the bias of two particular formalisms: decision trees and linear threshold units. The immediate result is a new hybrid representation, called a percep-tron tree, and an associated learning algorithm called the perceptron tree error correction proce-dure. The longer term result is a model for ex-ploring issues related to understanding represen-tational bias and constructing other useful hybrid representations.

Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning.

Conference Paper

Jan 1993

Inductive and Bayesian Learning in Medical Diagnosis

Article

Oct 1993

Igor Kononenko

Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two different approaches to machine learning in medical applications are compared: the system for inductive learning of decision trees Assistant, and the naive Bayesian classifier. Both methodologies were tested in four medical diagnostic problems: localization of primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology. The accuracy of automatically acquired diagnostic knowledge from stored data records is compared, and the interpretation of the knowledge and the explanation ability of the classification process of each system is discussed. Surprisingly, the naive Bayesian classifier is superior to Assistant in classification accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable. In addition, two extensions to naive Bayesian classifier are briefly described: dealing with continuous attributes, and discovering the dependencies among attributes.

Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid

Abstract and Figures

Recommended publications

An interpretable predictive framework for students’ withdrawal problem using multiple classifiers

Decision-tree classifier hybrid model of eager strategy and lazy strategy

Comprehensible classification models: A position paper

Induction of hybrid decision tree based on post-discretization strategy*

Option Decision Trees with Majority Votes

Classification Using Adaptive Decision Tree Induction Algorithm for Accuracy and Scale-up Measure