Conference PaperPDF Available

Multivariate Cluster-Based Discretization for Bayesian Network Structure Learning

September 2015

September 2015

DOI:10.1007/978-3-319-23540-0_11

Conference: scalable uncertainty management 2015
At: Québec

Authors:

Christophe Gonzales

Aix-Marseille Université

K. Chevalier-Jabet

Institut de Radioprotection et de Sûreté Nucléaire (IRSN)

While there exist many efficient algorithms in the literature for learning Bayesian networks with discrete random variables, learning when some variables are discrete and others are continuous is still an issue. A common way to tackle this problem is to preprocess datasets by first discretizing continuous variables and, then, resorting to classical discrete variable-based learning algorithms. However, such a method is inefficient because the conditional dependences/arcs learnt during the learning phase bring valuable information that cannot be exploited by the discretization algorithm, thereby preventing it to be fully effective. In this paper, we advocate to discretize while learning and we propose a new multivariate discretization algorithm that takes into account all the conditional dependences/arcs learnt so far. Unlike popular discretization methods, ours does not rely on entropy but on clustering using an EM scheme based on a Gaussian mixture model. Experiments show that our method significantly outperforms the state-of-the-art algorithms.

Discretization: entropy v.s. clustering.

…

Averages of the TPR (left) and FPR (right) metrics for BNs with 10 to 30 nodes in function of the sample sizes.

…

Content may be subject to copyright.

Content uploaded by Ahmed Mabrouk

Content may be subject to copyright.

Multivariate Cluster-based Discretization for

Bayesian Network Structure Learning

Ahmed Mabrouk

, Christophe Gonzales

, Karine Jabet-Chevalier

, and Eric

Chojnaki

Institut de radioprotection et de sˆuret´e nucl´eaire, France

{Ahmed.Mabrouk,Karine.Chevalier-Jabet,Eric.Chojnaki}@irsn.fr

Sorbonne Universit´es, UPMC Univ Paris 06, CNRS, UMR 7606, LIP6, France

Christophe.Gonzales@lip6.fr

Abstract. While there exist many eﬃcient algorithms in the literature

for learning Bayesian networks with discrete random variables, learning

when some variables are discrete and others are continuous is still an

issue. A common way to tackle this problem is to preprocess datasets

by ﬁrst discretizing continuous variables and, then, resorting to classical

discrete variable-based learning algorithms. However, such a method is

ineﬃcient because the conditional dependences/arcs learnt during the

learning phase bring valuable information that cannot be exploited by

the discretization algorithm, thereby preventing it to be fully eﬀective.

In this paper, we advocate to discretize while learning and we propose a

new multivariate discretization algorithm that takes into account all the

conditional dependences/arcs learnt so far. Unlike popular discretization

methods, ours does not rely on entropy but on clustering using an EM

scheme based on a Gaussian mixture model. Experiments show that our

method signiﬁcantly outperforms the state-of-the-art algorithms.

Keywords: Multivariate discretization, Bayesian network learning.

1 Introduction

For several decades, Bayesian networks (BN) have been successfully exploited

for dealing with uncertainties. However, while their learning and inference mech-

anisms are relatively well understood when they involve only discrete variables,

their coping with continuous variables is still often unsatisfactory. One actually

has to trade-oﬀ between expressiveness and computational complexity: on one

hand, conditional Gaussian models and their mixing with discrete variables are

computationally eﬃcient but they deﬁnitely lack some expressiveness [12]; on the

other hand, mixtures of exponentials, bases or polynomials are very expressive

but at the expense of tractability [15,20]. In between lie discretization meth-

ods which, by converting continuous variables into discrete ones, can provide a

satisfactory trade-oﬀ between expressiveness and tractability.

Acknowledgment: This work was supported by the French Institute for Radiopro-

tection and Nuclear Safety (IRSN), the Belgium’s nuclear safety authorities (Bel V)

and European project H2020-ICT-2014-1 #644425 Scissor.

In many real-world applications, BNs are learnt from data and, when there

exist continuous attributes, those are often discretized prior to learning, thereby

opening the path to exploiting eﬃcient discrete variable-based learning algo-

rithms. However such an approach is doomed to be ineﬀective because the

conditional dependences/arcs learnt during the learning phase bring valuable

information that cannot be exploited by the discretization algorithm, thereby

severely limiting its eﬀectiveness. However, there exist surprisingly few papers

on discretizing while learning, probably because it incurs substantial computa-

tional costs and it requires multivariate discretization instead of just a univariate

one. In this direction, MDL and Bayesian scores used by search algorithms have

been adapted to include multivariate discretizations taking into account the BN

structure learnt so far [6,13]. But, to be naturally included into these scores, the

latter heavily rely on entropy-related maximizations which, as we shall see, is

not very well suited for BN learning. In [21], a non-linear dimensionality reduc-

tion process called GP-LVM combined with a Gaussian mixture model-based

discretization is proposed for BN learning. Unfortunately, GP-LVM looses the

random variable’s semantics and the discretization does not rely on the BN struc-

ture. As a consequence, the method does not exploit all the useful information.

Unlike in BN learning, multivariate discretization has often been exploited

in Machine Learning for supervised classiﬁcation tasks [1,2,5,9,22]. But the goal

is only to maximize the classiﬁcation power w.r.t. one target variable. As such,

only the individual correlations of each variable with the target are of inter-

est and, thus, only bivariate discretization is needed. BN structure learning is

fundamentally diﬀerent because the complete set of conditional dependences be-

tween all sets of variables is of interest and multivariate discretization shall most

often involve more than two variables. This makes these approaches not easily

transferable to BN learning. In [11], the authors propose a general multivariate

discretization relying on genetic algorithms to construct rulesets. However, the

approach is very limited because it is designed to cope with only one target and

the domain size of this variable needs to be small to keep the method tractable.

Discretizations have also been exploited in unsupervised learning (UL), but

those are essentially univariate [4,8,16,17], which make them usable per se only

as a preprocess prior to learning. However, BN learning can be related to UL

in the sense that all the BN’s variables can be thought of as targets whose dis-

cretized values are unobserved. This suggests that some key ideas underlying UL

algorithms might be adapted for learning BN structures. Clustering is one such

popular framework. In [14], for instance, multivariate discretization is performed

by clustering but, unfortunately, independences between random variables are

only considered given a latent variable. This limits considerably the range of

applications of the method because numerous continuous variables require the

latent one to have a large domain size in order to get good quality discretizations.

This approach is therefore limited to small datasets and, by not exploiting the

BN structure, it is best suited as a BN learning preprocess. Finally, by relying on

entropy, its eﬀectiveness for BN learning is certainly not optimal. However, here,

we advocate to exploit clustering methods for discretization w.r.t. BN learning.

Discretization for Bayesian Network Structure Learning 3

More precisely, we propose a new clustering-based approach for multivariate

discretization that takes into account the conditional dependences among vari-

ables discovered during learning. By exploiting clustering rather than entropy,

it avoids the shortcomings induced by the latter and, by taking into account the

dependences between random variables, it signiﬁcantly increases the quality of

the discretization compared to state-of-the-art clustering approaches.

The rest of the paper is organized as follows. Section 2 recalls BN learning

and discretizations. Then, in Section 3, we describe our approach and justify

its correctness. Its eﬀectiveness is highlighted through experiments in Section 4.

Finally, some concluding remarks are given in Section 5.

2 Basics on BN Structure Learning and Discretization

Uppercase (resp. lowercase) letters X, Z, x, z, represent random variables and

their instantiations respectively. Boldface letters represent sets.

Deﬁnition 1. A (discrete) BN is a pair (G, θ) where G = (X, A) is a directed

acyclic graph (DAG), X = {X

, ..., X

} represents a set of discrete random

variables

, A is a set of arcs, and θ = {P (X

|Pa(X

))}

i=1

is the set of the

conditional probability distributions (CPT) of the variables X

in G given their

parents Pa(X

) in G. The BN encodes the joint probability over X as:

P (X) =

i=1

P (X

|Pa(X

)). (1)

To avoid ambiguities between continuous variables and their discretized coun-

terparts, letters, when superscripted by “◦”, e.g.,

X, ˚x, represent variables and

their instantiations prior to discretization, else they are discretized (for discrete

variables, X =

X and x = ˚x). In the rest of the paper, n always denotes the num-

ber of variables in the BN, and we assume that

, . . . ,

are discrete whereas

l+1

, . . . ,

are continuous.

D and D denote the input databases before and

after discretization respectively and are assumed to be complete, i.e., they do

not contain any missing data. N refers to their number of records.

Given

D = {

(1)

(2)

, . . . ,

(N)

}, BN learning consists of ﬁnding DAG G that

most likely accounts for the observed data in

D. When all variables are discrete,

i.e., D =

D, there exist many eﬃcient algorithms in the literature for solving

this task. Those can be divided into 3 classes [10]: i) the search-based approaches

that look for the structure optimizing a score (BD, BDeu, BIC, AIC, K2, etc.);

ii) the constraint-based approaches that exploit statistical independence tests

(χ

, G

, etc.) to ﬁnd the best structure G; iii) the hybrid methods that exploit

a combination of both. In the rest of the paper, we will focus on search-based

approaches because our closest competitors, [13,6], belong to this class.

Basically, these algorithms start with a structure G

(often empty). Then, at

each step, they look in the neighborhood of the current structure for another

By abuse of notation, we use interchangeably X

∈ X to denote a node in the BN

and its corresponding random variable.

structure, say G, that increases the likelihood of structure G given observations

D, i.e., P (G|D). The neighborhood is often deﬁned as the set of graphs that diﬀer

from the current one only by one atomic graphical modiﬁcation (arc addition, arc

deletion, arc reversal). P (G|D) is computed locally through the aforementioned

scores, their diﬀerences stemming essentially from diﬀerent a priori hypotheses.

More precisely, assuming a uniform prior on all structures G, we have that:

P (G|D) =

P (D|G)P (G)

P (D)

∝ P (D|G) =

P (D|G, θ)π(θ|G)dθ, (2)

where θ is the set of parameters of the CPTs of a (discrete) BN with structure

G. Diﬀerent hypotheses on prior π and on θ result in the diﬀerent scores (see,

e.g., [18] for the hypotheses for the BIC score used later).

When database

D contains continuous variables, those can be discretized. A

discretization of a continuous variable

X is a function f : R → {0, . . . , g} deﬁned

by an increasing sequence of g cut points {t

, t

, ..., t

} such that:

f(˚x) =







0 if ˚x < t

k if t

≤ ˚x < t

k+1

, for all k ∈ {1, . . . , g − 1}

g if ˚x ≥ t

Let F be a set of discretization functions, one for each continuous variable. Then,

given F, if D denotes the (unique) database resulting from the discretization of

D by F., Eq. (2) becomes:

P (G|

D, F) ∝ P (

D|G, F) = P (D|

D, G, F)P (

D|G, F) = P (

D|D, G, F)P (D|G, F),

Assuming that all databases

D compatible with D given F are equiprobable, we

thus have that:

P (G|

D, F) ∝ P (D|G, F) =

P (D|G, F, θ)π(θ|F, G)dθ. (3)

BN structure learning therefore amounts to ﬁnd structure G

∗

such that G

∗

Argmax

P (G|D, F). Note that P (D|G, F, θ) corresponds to a classical score

over discrete data. π(θ|F, G) is the prior over the parameters of the BN given F.

Eq. (3) is precisely the one used when discretization is performed as a preprocess

before learning.

When discretization is performed while learning, like in [6,13], both the struc-

ture and the discretization should be optimized simultaneously. In other words,

the problem consists of computing Argmax

F,G

P (G, F|

D), where ﬁnding the best

discretization amounts to ﬁnd the best set of cut points (including the best size

for this set) for each continuous random variable. And we have that:

P (G, F|

D) = P (G|F,

D)P (F|

D) ∝ P (F|

P (D|G, F, θ)π(θ|F, G)dθ. (4)

As can be seen, the resulting equation combines the classical score on the

discretized data (the integral) with a score P (F|

D) for the discretization al-

gorithm itself. The logarithm of latter corresponds to what [6] and [13] call

(Λ) + DL

D→D

(

D, Λ) and S

(Λ;

D) respectively.

Discretization for Bayesian Network Structure Learning 5

Input: a database

D, an initial graph G, a score function sc on discrete variables

Output: the structure G of the Bayesian network

1 repeat

2 Find the best discretization F given G

3 {X

l+1

, . . . , X

} ← discretize variables {

l+1

, . . . ,

} given F

4 G ← G’s neighbor that maximizes scoring function sc w.r.t. {X

, . . . , X

}

5 until G maximizes the score;

Algorithm 1: Our structure learning architecture.

3 A New Multivariate Discretization-Learning Algorithm

As mentioned earlier, we believe that taking into account the conditional de-

pendences between random variables is important to provide high-quality dis-

cretizations. Our approach thus follows Eq. (4) and our goal is to compute

Argmax

F,G

P (G, F|

D). Optimizing jointly over F and G is too computation-

ally intensive a task to be usable in practice. Fortunately, we can approximate

it eﬃciently through a gradient descent, alternating optimizations over F given

a ﬁxed structure G and optimizations over G given a ﬁxed discretization F. This

suggests the BN structure learning method described as Algo. 1.

Multivariate discretization is much more time consuming than univariate dis-

cretization. As such, Line 2 could thus incur a strong overhead to the learning

algorithm because the discretization search space increases exponentially with

the number of variables to discretize. To alleviate this problem without sacriﬁc-

ing too much in accuracy, we suggest a local search algorithm that iteratively

ﬁxes the discretizations of all the continuous variables but one and optimizes

the discretization of the latter (given the other variables) until some stopping

criterion is met. As such, discretizations being optimized one continuous variable

at a time, the combinatorics and the computation time are signiﬁcantly limited.

Line 2 can thus be detailed as Algo. 2.

3.1 Discretization Criterion

To implement Algo. 2, a discretization criterion to be optimized is needed. Basic

ideas include trying to ﬁnd cut points minimizing the discrepancy between the

frequencies or the sizes of intervals [t

, t

k+1

). A more sophisticated approach

Input: a database

D, a graph G, a scoring function sc on discrete variables

Output: a discretization F

1 repeat

2 i

← Select an element in {l + 1, . . . , n}

3 Discretize

given G and {X

, . . . , X

−1

, X

, . . . , X

}

4 until stopping condition;

Algorithm 2: One-variable discretization architecture.

consists of limiting as much as possible the quantity of information lost after

discretization, or equivalently to maximize the quantity of information remain-

ing after discretization. This naturally calls for maximizing an entropy. This is

essentially what our closest competitors, [6,13], do.

But entropy may not be the most appropriate measure when dealing with

BNs. Actually, consider a variable A with domain {a

, a

}. Then, it is pos-

sible that, for some BN, P (A = a

) =

, P (A = a

) =

and P (A = a

) =

With a suﬃciently large database D, the frequencies of observations of a

, a

in D would certainly lead to estimate P (A) ≈ [

]. Now, assume that the

observations in D are noisy, say with a Gaussian noise with an inﬁnitely small

variance, as in Fig. 1. Then, after discretization, we shall expect to have 3 in-

tervals with respective frequencies

and

, i.e., intervals similar to (−∞, t

, t

) and [t

, +∞) of Fig. 1. However, w.r.t. entropy, the best discretization

corresponds to intervals [−∞, s

), [s

, s

) and [s

, +∞) of Fig. 1 whose fre-

quencies are all approximately equal to

(entropy is maximal for equiprobable

intervals). Therefore, whatever the inﬁnitesimal noise added to data in D, an

entropy-based discretization produces a discretized variable A with distribution

[

] instead of [

]. This suggests that entropy is probably not the best

criterion for discretizing continuous variables for BN learning.

Fig. 1 suggests that clustering would probably be more appropriate: here, one

cluster/interval per Gaussian would provide a better discretization. In this paper,

we assume that, within every interval, each continuous random variable, say

is distributed w.r.t. a truncated Gaussian. Over its whole domain of deﬁnition,

it is thus distributed as a mixture of truncated Gaussians, the weights of the

latter being precisely the CPT of X

in the discrete BN. In particular, if

has some parents, there are as many mixtures as the product of the domain sizes

of the parents. The parameters of such a discretization scheme are therefore: i) a

set of g cut points (to deﬁne g + 1 intervals) and ii) a mean and a variance for

each interval (to deﬁne its Gaussian). Fig. 1 actually illustrates the fact that the

means of the Gaussians need not necessarily correspond to the middles of the

intervals. For instance, the mean of the third Gaussian is a

whereas the third

interval, [t

, +∞), has no ﬁnite middle. Here, even ﬁnite interval middles, like

that of [t

, t

), do not correspond to the means of the Gaussians.

For each continuous random variable

, this joint optimization problem is

really hard due to the normalization requirements that the integrals of the trun-

cated exponential of each interval must sum to 1 (which cannot be expressed

using closed-form formulas). Therefore, to alleviate the discretization computa-

Fig. 1. Discretization: entropy v.s. clustering.

Discretization for Bayesian Network Structure Learning 7

tional burden, we propose to approximate the computation of the cut points,

means and variances using a two-step process: ﬁrst, we approximate the density

of the joint distribution of {X

, . . . , X

−1

, X

, . . . , X

} as a mixture of

untruncated Gaussians and we determine by maximum likelihood the number

of cut-points as well as the means and variances of the Gaussians. This can be

easily done by an Expectation-Maximization (EM) approach. Then, in a second

step, we compute the best cut points w.r.t. the Gaussians. As each Gaussian is

associated with an interval, the parts of the Gaussian outside the interval can

be considered as a loss of information and we will therefore look for cut points

that minimize this loss. Now, let us delve into the details of the approach.

3.2 Discretization Exploiting the BN Structure

For the ﬁrst discretization step of

, we estimate the number g of cut-points

and the Gaussians’ means and variances. Assume that structure G is ﬁxed and

that all the other variables are discrete. The density over all the variables, p(

X),

is equal to p(

|Pa(

))

i6=i

P (X

|Pa(X

)), where p(

|Pa(

)) repre-

sents a mixture of Gaussians for each value of

’s parents (there are a ﬁnite

number of values since all the variables but

are discrete). P (X

|Pa(X

))

should be the CPT of discrete variable X

but, unfortunately, it is not well de-

ﬁned if

∈ Pa(X

) because, in this case, Pa(X

) has inﬁnitely many values.

This is a serious issue since this CPT is used in the computation of P (D|G, F, θ)

of Eq. (4). Fortunately, this problem can be overcome by enforcing that

has

no child while guaranteeing that the density remains unchanged. Actually, in

[19], an arc reversal operator is provided that, when applied, never alters the

density/probability distribution. More precisely, when reverting arc X → Y ,

Shachter showed that if all the parents of X are added to Y and all the parents

of Y except X are added to X, then the resulting BN encodes the same distribu-

tion. As an example of these transformations, reversing arc X → V of Fig. 2.(a)

results in Fig. 2.(b) and, then, reversing arc X → W results in Fig. 2.(c).

Therefore, to enforce that

has no child, if {i

, . . . , i

} denotes the set of

indices of the children variables of

, sorted by a topological order of G, then,

by reversing sequentially all the arcs X

→

, j = 1, . . . , c, we get:

X) = p(

|Pa(

)) ×

i6={i

,...,i

}

P (X

|Pa(X

)) ×

j=1

P (X

|Pa(X

)),

W V

(a) (b) (c)

Fig. 2. Shachter’s arc reversals.

X) = p(

|MB(

)) ×

i6={i

,...,i

}

P (X

|Pa(X

))

j=1

P (X

[

h=1

(Pa(X

)\{

}) ∪ Pa(

)),

where MB(

) is the Markov blanket of

in G:

Deﬁnition 2. The Markov blanket of any node in G is the set of its parents, its

children and the other parents of its children.

Note that, in the last expression of p(

X), only the ﬁrst term involves

hence all the other CPTs are well deﬁned (they are ﬁnite CPTs). As a side eﬀect,

only p(

|MB(

)) needs be taken into account to discretize

since none

of the other terms is related to

. It shall be noted here that these arc reversals

are applied only for determining the parameters of the discretization, i.e., the

set of cut points, means and variances of the Gaussians, they are never used to

learn the BN structure. Now, let us see how the parameters of the mixture of

Gaussians p(

|MB(

)) maximizing the likelihood of dataset

D can be easily

estimated using an EM algorithm.

3.3 Parameter Estimation by an EM Algorithm

Let q

represent the (ﬁnite) number of values of MB(

). For simplicity, we

will denote by {1, . . . , q

} the set of values of the joint discrete random variable

MB(

). Let g denote the number of cut points in the discretization and let

{N (µ

, σ

) : k ∈ {0, . . . , g}} be the corresponding set of Gaussians. Then:

= ˚x

|MB(

) = j) =

k=0

f(˚x

|θ

) ∀ j ∈ {1, . . . , q

where f (·|θ

) represents the density of the normal distribution of parameters

= (µ

, σ

), and π

represents the weights of the mixture (with the constraint

that π

≥ 0 for all j, k and

k=0

= 1 for all j). Remember that each value

of MB(

) induces its own set of weights {π

, . . . , π

}. Now, we propose to

estimate parameters θ

from

D by maximum likelihood. For this, EM is well-

known to eﬃciently provide good approximations [3] (due to the mixture, direct

maximum likelihood is actually hard to estimate). Assuming that data in

D are

i.i.d., the log-likelihood of

D given Θ =

k=0

(

j=1

{π

} ∪ {θ

}) is equal to:

D|Θ) =

m=1

log p(

= ˚x

(m)

|MB(˚x

)

(m)

, Θ),

where ˚x

(m)

represents the observed value of

in the mth record of

D. Thus:

D|Θ) =

j=1

m:MB(˚x

)

(m)

log

k=0

f(˚x

(m)

|θ

)

. (5)

Discretization for Bayesian Network Structure Learning 9

To solve Argmax

D|Θ), EM [3] iteratively alternates expectations (E-

step) and maximizations (M-step) until convergence toward a local maximum

which is guaranteed to correspond to the Argmax we look for due to the concavity

of the log-likelihood function. In this paper, we just need to apply the standard

EM, considering for weights π

only the records in the database that correspond

to MB(˚x

)

(m)

= j. More precisely, for each record of

D, let Z

(m)

be a random

variable whose domain is {0, . . . , g}, and such that Z

(m)

= k if and only if

observation ˚x

(m)

has been generated from the kth Gaussian. Let Q

(m)

) =

P (Z

(m)

|˚x

(m)

, Θ

), i.e., Q

(m)

) represents the distribution that, at the tth

step of the algorithm, ˚x

(m)

is believed to have been generated by such and such

Gaussian. Then, EM is described in Algo. 3.

In the EM algorithm, only the M-step can be computationally intensive.

Fortunately, here, we can derive in closed-form the optimal values of Line 4:

Proposition 1. At the E-step, probability Q

t+1

(k) =

f(˚x

(m)

|θ

)

f(˚x

(m)

|θ

)

where π

and θ

are weights, means and variances in Θ

. The optimal param-

eters of the M-step are respectively:

t+1

m:MB(˚x

)

(m)

t+1

(k)

m:MB(˚x

)

(m)

t+1

)

t+1

m=1

t+1

(k)˚x

(m)

m=1

t+1

(k)

t+1

m=1

t+1

(k)(˚x

(m)

− µ

t+1

)

m=1

t+1

(k)

Using Algo. 3 with the formulas of Proposition 1, it is thus possible to deter-

mine the means and variances of the Gaussians. However, our ultimate goal is not

to compute them but to exploit them to discretize variable

, i.e., to determine

the best cut points t

, . . . , t

. Let us see how this task can be performed.

Input: a database

D, a number g of cut points

Output: an optimal set of parameters Θ

1 Select (randomly) an initial value Θ

2 repeat

// E-step (expectation)

3 Q

t+1

(m)

) ← P (Z

(m)

|˚x

(m)

, Θ

) ∀ m ∈ {1, . . . , N }

// M-step (maximization)

4 Θ

t+1

← Argmax

j=1

m:MB(˚x

)

(m)

k=0

t+1

(k) log

f(˚x

(m)

|θ

)

t+1

(k)

5 until convergence;

Algorithm 3: The EM algorithm.

3.4 Determination of the Cut Points

As mentioned at the end of Subsection 3.1, each Gaussian N (µ

, σ

) is associated

with an interval [t

, t

k+1

)

and the parts of the Gaussian outside the interval

can be considered as a loss of information. The optimal set of cut points

T =

{

, . . . ,

} is thus that which minimizes this loss. In other words, it is equal to:

T = Argmin

,...,t

}

k=1

+∞

f(x|θ

k−1

)dx +

−∞

f(x|θ

)dx,

where θ

represents pairs (µ

, σ

). As each Gaussian N (µ

, σ

) is associated

with interval [t

, t

k+1

), we can assume that

∈ [µ

k−1

, µ

), for all k. Therefore:

T =

(

Argmin

∈[µ

k−1

,µ

)

+∞

f(x|θ

k−1

)dx +

−∞

f(x|θ

)dx : k ∈ {1, . . . , g}

)

. (6)

All the

can thus be determined independently. In addition, as shown below,

their values are the solution of a quadratic equation:

Proposition 2. Let u(t

) represent the sum of the integrals in Eq. (6). Let α

be a solution (if any) within interval (µ

k−1

, µ

) of the quadratic equation in t



k−1

−



+ 2t



−

k−1





k−1

−

− 2log

k−1



= 0. (7)

Then

is, among {µ

k−1

, µ

, α

}, the element with the highest value of u(·)

(which can be quickly approximated using a table of the Normal distribution).

Proof. Let g(·) and h(·) be two functions such that ∂g(x)/∂x = f(x|θ

k−1

) and

∂h(x)/∂x = f (x|θ

). Then:

= Argmin

∈[µ

k−1

,µ

)

u(t

) = Argmin

∈[µ

k−1

,µ

)

+∞

∂g(x)

∂x

dx +

−∞

∂h(x)

∂x

= Argmin

∈[µ

k−1

,µ

)

−g(t

) + h(t

) + lim

t→+∞

[g(t) − h(−t)].

Let us relax the optimization problem and try to ﬁnd the Argmin over R. Then

the min is obtained when ∂u(t

)/∂t

= 0 or, equivalently, when ∂(−g(t

) +

h(t

))/∂t

= −f(t

|θ

k−1

) + f(t

|θ

) = 0. Since f (·|θ) represents the density of

the Normal distribution of parameters θ, this is equivalent to:

−

√

2πσ

k−1

exp



−



−µ

k−1





√

2πσ

exp



−



−µ





= 0,

or, equivalently:

Without loss of generality, we consider here that the the µ

’s resulting from the EM

algorithm are sorted by increasing order.

Discretization for Bayesian Network Structure Learning 11

k−1

exp



−



−µ





exp



−



−µ

k−1





= exp



− µ

k−1



−



− µ



which, by a log transformation, is equivalent to:

2 log

k−1

−

2µ

k−1

−

2µ

−

This corresponds precisely to Eq. (7). So, to summarize, if the optimal solution

lies inside interval (µ

k−1

, µ

), then it satisﬁes Eq. (7). Otherwise, either u(t

)

is strictly increasing or strictly decreasing within (µ

k−1

, µ

), which implies that

the optimal solution for

is either µ

k−1

or µ

, which completes the proof. 

3.5 Score and Number of Cut Points

To complete the description of the algorithm, there remains to determine the

number of cut points. Of course, the higher the number of cut points, the higher

the likelihood but the lower the compactness of the representation. To reach

of good trade-oﬀ, we simply propose to exploit the penalty functions included

into the score used for the evaluation of diﬀerent BN structures (see Line 5 of

Algo. 1). Here, we used the BIC score [18], which can be locally expressed as:

BIC(

|MB(

)) = L(

D|Θ) −

|Θ|

log(N) (8)

where L(

D|Θ) is the log-likelihood with the parameters estimated by EM, given

the current structure G. |Θ| represents the number of parameters, i.e., |Θ| =

×g+2×(g+1): the 1st and 2nd terms correspond to the number of parameters

and of (µ

, σ

) needed to encode the conditional distributions (recall that

there are g + 1 Gaussians and q

represents the domain size of MB(

)). Now,

the best number of cut points is simply that which optimizes Eq. (8).

4 Experimentations

In this section, we highlight the eﬀectiveness of our method, hereafter denoted

MGCD (for Mixture of Gaussians Clustering-based Discretization), by compar-

ing it with the algorithms provided in [17] and [6], hereafter called Ruichu and

Friedman respectively. Step 4 of Algo. 1 was performed using a simple Tabu

search method. For the comparisons, three criteria have been taken into account:

i) the quality of the structure learnt by the algorithm (which strongly depends

on that of the discretization); ii) the computation time and iii) the quality of

the learnt CPT parameters, which has been evaluated by their prediction power

on the values taken by some variables given observations.

For the ﬁrst two criteria, we randomly generated discrete BNs following the

guidelines given in [7]. Those contained from 10 to 30 nodes and from 12 to 56

2000 4000 6000 8000 10000

0.0 0.2 0.4 0.6 0.8

dataset sizes

True Positive Rate (TPR)

MGCD

Ruichu

Friedman

2000 4000 6000 8000 10000

0.00 0.02 0.04 0.06 0.08 0.10

dataset sizes

False Positive Rate (FPR)

MGCD

Ruichu

Friedman

Fig. 3. Averages of the TPR (left) and FPR (right) metrics for BNs with 10 to 30

nodes in function of the sample sizes.

arcs. Each node had at most 6 parents and its domain size was randomly chosen

between 2 and 5. The CPTs of these BNs represented the π

of the preceding

section. From these BNs, we generated continuous datasets containing from 1000

to 10000 records as follows: for each random variable X

, we mapped its ﬁnite set

of values into a set of consecutive intervals {[t

k−1

, t

)}

k=1

of arbitrary lengths.

Then, we assigned a truncated Gaussian to each interval, the parameters (µ

, σ

)

of which were randomly chosen. Finally, to generate a continuous record, we

ﬁrst generated a discrete record from the discrete BN using a logic sampling

algorithm. Then, this record was mapped into a continuous one by sampling from

the truncated exponentials. Overall, 350 continuous datasets were generated.

To compare them, the BN structures produced by Ruichu, Friedman and

MGCD were converted into their Markov equivalence class, i.e., into a partially

directed DAG (CPDAG). Such a transformation increases the quality of com-

parisons since two BNs encode the same distribution iﬀ they belong to the same

equivalence class. The CPDAGs were then compared w.r.t. their true and false

positive rate metrics (TPR and FPR). TPR (resp. FPR) represents the percent-

age of arcs/edges belonging to the learnt CPDAG that also exist (resp. do not

exist) in the original CPDAG. Both metrics describe how well the dependences

between variables are preserved by learning/discretization. Fig. 3 shows the av-

erage TPR and FPR over the 350 generated databases. As can be seen, MGCD

outperforms the others for all dataset sizes: MGCD’s TPR is about 10% higher

than Ruichu and 40% higher than Friedman, and MGCD’s FPR is between 20%

and 40% lower than the other methods. MGCD’s performance w.r.t. Ruichu’s

can be explained by that fact that, unlike Ruichu’s, it fully takes into account the

conditional dependences between all the random variables. Its performance w.r.t.

Friedman’s can be explained by our choice of exploiting clustering rather than

an entropy-based approach. Table 1 provides computation time ratios (other

method’s runtime / MGCD’s runtime). As can be seen, our method slightly

outperforms Ruichu’s (but is 10% better in terms of TPR and more than 20%

better in terms of FPR) and it signiﬁcantly outperforms Friedman’s (about 3

times faster) while at the same time being 40% higher in terms of TPR.

Discretization for Bayesian Network Structure Learning 13

Approaches / Dataset sizes 1000 5000 7500 10000

Friedman 2.762444 3.350782 3.404958 3.361540

Ruichu 0.8872389 1.1402535 1.1334637 1.1032982

Table 1. Runtime ratio comparisons between discretization approaches.

Finally, we compared the discretizations w.r.t. the quality of the produced

CPTs. To do so, we generated from two classical BNs, Child and Sachs, 100

continuous databases using the same process as above except that: i) the dis-

tributions inside intervals were uniform instead of Gaussians (to penalize our

approach since data do not ﬁt its hypotheses), and ii) some small sets of vari-

ables were kept discrete and served as multilabel targets. Databases were split

into a learning (2/3) and a test (1/3) part. For each record in the latter, we

computed the distribution (learnt by each of the 3 algorithms on the learning

database) of each target given some observations on their Markov blanket and

we estimated the value of the target by sampling it from the learnt distribution.

The percentages of correct predictions are shown in Table 2. As we can see, our

algorithm outperforms the other algorithms, especially Ruichu’s, which fails to

have correct predictions due to its univariate discretization not taking into ac-

count the conditional dependencies among random variables. Friedman’s results

are closer to ours but recall that it is about 3 times slower than ours.

5 Conclusion

We have proposed a new multivariate discretization algorithm designed for BN

structure learning, taking into account the dependences among variables ac-

quired during learning. Our experiments highlighted its eﬃciency and eﬀective-

ness compared to state-of-the-art algorithms, but more experiments are of course

needed to better assess the strengths and the shortcoming of our proposed ap-

proach. For future work, we plan to improve our algorithm, notably by directly

working with truncated Gaussians instead of the current approximation by mix-

ture of Gaussians. But such an improvement is not trivial due to the fact that,

in this case, no closed-form solution exists for determining the cut points.

datasets sizes 30 % Markov blanket 60% Markov blanket 100 % Markov blanket

MGCD Ruichu Friedman MGCD Ruichu Friedman MGCD Ruichu Friedman

Child

1000 60.90 59.30 58.99 62.76 60.90 59.96 67.53 65.34 63.33

2000 61.59 56.62 58.05 62.41 59.24 60.55 67.29 64.71 63.03

5000 64.88 62.29 60.05 66.07 62.95 61.94 69.42 65.39 63.82

10000 65.81 62.48 61.75 67.44 63.85 63.51 70.49 66.92 65.79

Sachs

1000 56.63 54.78 57.59 57.22 55.04 58.74 65.67 61.06 64.65

2000 56.96 56.16 54.02 59.72 57.58 56.64 65.80 62.24 60.22

5000 57.69 55.00 55.15 59.80 57.96 56.38 65.51 64.15 64.47

10000 60.35 57.50 57.33 61.67 58.26 59.22 70.04 65.74 64.61

Table 2. Prediction accuracy rates for discrete target variables in the Child and Sachs

standard BNs (http://www.bnlearn.com/bnrepository/) w.r.t. the percentage of ob-

served variables in the Markov blanket.

References

1. Boull´e, M.: MODL: a Bayes optimal discretization method for continuous at-

tributes. Machine learning 65(1), 131–165 (2006)

2. Boull´e, M.: Khiops: A statistical discretization method of continuous attributes.

Machine Learning 55(1), 53–69 (2004)

3. Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood from incomplete data

via the EM algorithm. Journal of the Royal Statistical Society, pp. 1–38 (1977)

4. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization

of continuous features. In: proc. of ICML’95. pp. 194–202 (1995)

5. Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes

for classiﬁcation learning. In: Proc. of IJCAI’93. pp. 1022–1029 (1993)

6. Friedman, N., Goldszmidt, M.: Discretizing continuous attributes while learning

Bayesian networks. In: proc. of ICML’96. pp. 157–165 (1996)

7. Ide, J.S., Cozman, F.G., Ramos, F.T.: Generating random Bayesian networks with

constraints on induced width. In: Proc. of ECAI’04. pp. 323–327 (2004)

8. Jiang, S., Li, X., Zheng, Q., Wang, L.: Approximate equal frequency discretization

method. In: proc. of GCIS’09. pp. 514–518 (2009)

9. Kerber, R.: ChiMerge: Discretization of numeric attributes. In: Proc. of AAAI’92.

pp. 123–128 (1992)

10. Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Tech-

niques. MIT Press (2009)

11. Kwedlo, W., Kr¸etowski, M.: An evolutionary algorithm using multivariate dis-

cretization for decision rule induction. In: proc. of PKDD’99. pp. 392–397 (1999)

12. Lauritzen, S., Wermuth, N.: Graphical models for associations between variables,

some of which are qualitative and some quantitative. Annals of Statistics 17(1),

31–57 (1989)

13. Monti, S., Cooper, G.: A multivariate discretization method for learning Bayesian

networks from mixed data. In: Proc. of UAI’98. pp. 404–413 (1998)

14. Monti, S., Cooper, G.: A latent variable model for multivariate discretization. In:

proc. of AIS’99. pp. 249–254 (1999)

15. Moral, S., Rumi, R., Salmeron, A.: Mixtures of truncated exponentials in hybrid

Bayesian networks. In: Proc. of ECSQARU’01. Lecture Notes in Artiﬁcial Intelli-

gence, vol. 2143, pp. 156–167 (2001)

16. Ratanamahatana, C.: CloNI: Clustering of sqrt(n)-interval discretization. In: proc.

of Int. Conf. on Data Mining & Comm Tech. (2003)

17. Ruichu, C., Zhifeng, H., Wen, W., Lijuan, W.: Regularized Gaussian mixture model

based discretization for gene expression data association mining. Applied intelli-

gence 39(3), 607–613 (2013)

18. Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6(2), 461–

464 (1978)

19. Shachter, R.: Evaluating inﬂuence diagrams. Operations Research 34(6), 871–882

(1986)

20. Shenoy, P., West, J.: Inference in hybrid Bayesian networks using mixtures of poly-

nomials. International Journal of Approximate Reasoning 52(5), 641–657 (2011)

21. Song, D., Ek, C., Huebner, K., Kragic, D.: Multivariate discretization for Bayesian

network structure learning in robot grasping. In: proc. of ICRA’11. pp. 1944–1950

(2011)

22. Zighed, D., Rabas´eda, S., Rakotomalala, R.: FUSINTER: a method for discretiza-

tion of continuous attributes. International Journal of Uncertainty, Fuzziness and

Knowledge-Based Systems 6(03), 307–326 (1998)

Conditional probability table limit-based quantization for Bayesian networks: model quality, data fidelity and structure score

Article

Full-text available

Apr 2024
APPL INTELL

Bayesian Networks (BN) are robust probabilistic graphical models mainly used with discrete random variables requiring discretization and quantization of continuous data. Quantization is known to affect model accuracy, speed and interpretability, and there are various quantization methods and performance comparisons proposed in literature. Therefore, this paper introduces a novel approach called CPT limit-based quantization (CLBQ) aimed to address the trade-off among model quality, data fidelity and structure score. CLBQ sets CPT size limitation based on how large the dataset is so as to optimize the balance between the structure score of BNs and mean squared error. For such a purpose, a range of quantization values for each variable was evaluated and a Pareto set was designed considering structure score and mean squared error (MSE). A quantization value was selected from the Pareto set in order to balance MSE and structure score, and the method’s effectiveness was tested using different datasets, such as discrete variables with added noise, continuous variables and real continuous data. In all tests, CLBQ was compared to another quantization method known as Dynamic Discretization. Moreover, this study assesses the suitability of CLBQ for the search and score of BN structure learning, in addition to examining the landscape of BN structures while varying dataset sizes and confirming its consistency. It was sought to find the expected structure location through a landscape analysis and optimal BNs on it so as to confirm whether the expected results were actually achieved in the search and score of BN structure learning. Results demonstrate that CLBQ is quite capable of striking a balance between model quality, data fidelity and structure score, in addition to evidencing its potential application in the search and score of BN structure learning, thus further research should explore different structure scores and quantization methods through CLBQ. Furthermore, its code and used datasets have all been made available.

Bayesian networks in additive manufacturing and reliability engineering

Thesis

Full-text available

Mar 2019

Azarakhsh Hamedi

Bayesian networks in additive manufacturing and reliability engineering A Bayesian network (BN) is a powerful tool to represent the quantitative and qualitative features of a system in an intuitive yet sophisticated manner. The qualitative aspect is represented with a directed acyclic graph (DAG), depicting dependency relations between the random variables of the system. In a DAG, the variables of the system are shown with a set of nodes and the dependencies between them are shown with a directed edge. A DAG in the Bayesian network can be a causal graph under certain circumstances. The quantitative aspect is the local conditional probabilities associated with each variable, which is a factorization of the joint probability distribution of the variables in the system based on the dependency relation represented in the DAG. In this study, the benefits of using BNs in reliability engineering and additive manufacturing is investigated. In the case of reliability engineering, there are several methods to create predictive models for reliability features of a system. Predicting the possibility and the time of a possible failure is one of the important tasks in the reliability engineering principle. The quality of the cor-rective maintenance after each failure is affecting consecutive failure times. If a maintenance task after each failure involves replacing all the components of an equipment, called perfect maintenance , it is considered that the equipment is restored to an "as good as new" (AGAN) condition, and based on that, the consecutive failure times are considered independent. Not only in most of the cases the maintenance is not perfect, but the environment of the equipment and the usage patterns have a significant effect on the consecutive failure times. In this study, this effect is investigated by using Bayesian network structural learning algorithms to learn a BN based on the failure data of an industrial water pump. In additive manufacturing (AM) field, manufacturing systems are normally a complex combination of multiple components. This complex nature and the associated uncertainties in design and manufacturing parameters in additive manufacturing promotes the need for models that can handle uncertainties and are efficient in calculations. Moreover, the lack of AM knowledge in practitioners is one of the main obstacles for democratizing it. In this study, a method is developed for creating Bayesian network models for AM systems that includes experts' and domain knowledge. To form the structure of the model, causal graphs obtained through dimensional analysis conceptual modeling (DACM) framework is used as the DAG for a Bayesian network after some modifications. DACM is a framework for extracting the causal graph and the governing equations between the variables of a complex system. The experts' knowledge is extracted through a probability assessment process, called the analytical hierarchy process (AHP) and encoded into local probability tables associated with the independent variables of the model. To complete the model, a sampling technique is used along with the governing equations between the intermediate and output variables to obtain the rest of the probability tables. Such models can be used in many use cases, namely domain knowledge representation, defect prognosis and diagnosis and design space exploration. The qualitative aspect of the model is obtained from the physical phenomena in the system and the quantitative aspect is obtained from the experts' knowledge, therefore the model can interactively represent the domain and the experts' knowledge. In prognosis tasks, the probability distribution for the values that an output variable can take is calculated based on the values chosen for the input variables. In diagnosis tasks, the designer can investigate the reason for having a specific value in an output variable among the inputs. Finally, the model can be used to perform design space exploration. The model reduces the design space into a discretized and interactive Bayesian network space which is very convenient for design space exploration.

On Conditional Truncated Densities Bayesian Networks

Article

Full-text available

Oct 2017
INT J APPROX REASON

The majority of Bayesian networks learning and inference algorithms rely on the assumption that all random variables are discrete, which is not necessarily the case in real-world problems. In situations where some variables are continuous, a trade-off between the expressive power of the model and the computational complexity of inference has to be done: on one hand, conditional Gaussian models are computationally efficient but they lack expressive power; on the other hand, mixtures of exponentials (MTE), basis functions (MTBF) or polynomials (MOP) are expressive but this comes at the expense of tractability. In this paper, we introduce an alternative model called a ctdBN that lies in between. It is composed of a “discrete” Bayesian network (BN) combined with a set of univariate conditional truncated densities modeling the uncertainty over the continuous random variables given their discrete counterpart resulting from a discretization process. We prove that ctdBNs can approximate (arbitrarily well) any Lipschitz mixed probability distribution. They can therefore be exploited in many practical situations. An efficient inference algorithm is also provided and its computational complexity justifies theoretically why inference computation times in ctdBNs are very close to those in discrete BNs. Experiments confirm the tractability of the model and highlight its expressive power, notably by comparing it with BNs on classification problems and with MTEs and MOPs on marginal distributions estimations.

Interactive Anomaly Detection in Mixed Tabular Data using Bayesian Networks

Conference Paper

Full-text available

Sep 2020

The last decades improvements in processing abilities have quickly led to an increasing use of data analyses implying massive data-sets. To retrieve insightful information from any data driven approach, a pivotal aspect to ensure is good data quality. Manual correction of massive data-sets requires tremendous efforts, is prone to errors, and results being really costly. If knowledge in a specific field can often allow the development of efficient models for anomaly detection and data correction, this knowledge can sometimes be unavailable and a more generic approach should be found. This paper presents a novel approach to anomaly detection and correction in mixed tabu-lar data using Bayesian Networks. We present an algorithm for detecting anomalies and offering correction hints based on Jensen scores computed within the Markov Blankets of considered variables. We also discuss the incremental corrections of detection model using user's feedback, as well as additional aspects related to discretization in mixed data and its effects on detection efficiency. Finally we also discuss how functional dependencies can be managed to detect errors while improving faithfulness and computation speed.

Video Event Detection Based Non-stationary Bayesian Networks

Conference Paper

Oct 2016

In this paper, we propose an approach for detecting events online in video sequences. This one requires no prior knowledge, the events being defined as spatio-temporal breaks. For this purpose, we propose to combine non-stationary dynamic Bayesian networks (nsDBN) to model the scene and particle filter (PF) to track objects in the sequence. In this framework, an event corresponds to a significant difference between a new particle set provided by PF and the sampled density encoded by the nsDBN. Whenever an event is detected, the particle set is exploited to learn a new nsDBN representing the scene. Unfortunately, nsDBNs are designed for discrete random variables and particles are instantiations of continuous ones. We therefore propose to discretize them using a new discretization method well suited for nsDBNs. Our approach has been tested on real video sequences and allowed to detect two different events (forbidden stop and fight).

Aligning the achievement of SDGs with long-term sustainability and resilience: An OOBN modelling approach

Article

Apr 2022
ENVIRON MODELL SOFTW

This research utilizes an Object-Oriented Bayesian Network (OOBN) to model the relationships between the Sustainable Development Goal (SDGs) and resilience and sustainability at national, regional, and global levels. The ability of the OOBN to learn the parameters, i.e., the conditional probability distributions between the variables included in the network, was exploited to explore the impacts of progress of SDGs on the sustainability and resilience of nations. The resulting OOBN is used to examine different situations pertinent to policy analysis and design at the times of disasters, particularly in the wake of the COVID-19 pandemic. Three case studies are used to illustrate the step by step process of using the proposed OOBN as well as the expected results of its application in policy analysis and evaluation contexts. The proposed is able to provide insight regarding which SDGs will have more significant impacts on both resilience and sustainability as well as their constituent components. The results of this research indicate how data induced OOBNs can be utilised by policy makers to prioritize new policies and evaluate the impacts of existing policies on both the resilience and sustainability of societies.

Dealing with Continuous Variables in Graphical Models

Chapter

Dec 2019

Christophe Gonzales

Uncertain reasoning over both continuous and discrete random variables is important for many applications in artificial intelligence. Unfortunately, dealing with continuous variables is not an easy task. In this tutorial, we will study some of the methods and models developed in the literature for this purpose. We will start with the discretization of continuous random variables. A special focus will be made on the numerous issues they raise, ranging from which discretization criterion to use, to the appropriate way of using them during structure learning. These issues will justify the exploitation of hybrid models designed to encode mixed probability distributions. Several such models have been proposed in the literature. Among them, Conditional Linear Gaussian models are very popular. They can be used very efficiently for inference but they lack flexibility in the sense that they impose that the continuous random variables follow conditional Normal distributions and are related to other variables through linear relations. Other popular models are mixtures of truncated exponentials, mixtures of polynomials and mixtures of truncated basis functions. Through a clever use of mixtures of distributions, these models can approximate very well arbitrary mixed probability distributions. However, exact inference can be very time consuming in these models. Therefore, when choosing which model to exploit, one has to trade-off between the flexibility of the uncertainty model and the computational complexity of its learning and inference mechanisms.

CESAM – Code for European severe accident management, EURATOM project on ASTEC improvement

Article

Jun 2018
ANN NUCL ENERGY

The CESAM FP7 project (Van Dorsselaere et al., 2015) of EURATOM has been conducted from April 2013 until March 2017 in the aftermath of the Fukushima Dai-ichi accidents. Nineteen international partners from Europe and India, including the European Joint Research Centre, have participated under the coordination of GRS and with a strong involvement of IRSN that were both ASTEC code developers. The Project objectives were: to understand all relevant phenomena during the Fukushima Dai-ichi accidents and their importance for Severe Accident Management (SAM) measures; and to improve the ASTEC computer code to simulate plant behaviour throughout accident sequences including SAM measures. The starting point was the analysis of current SAM measures implemented in European nuclear power plants. To achieve these goals, simulations of relevant experiments that allow a solid validation of the ASTEC code against single and separate effect tests have been conducted. Covered validation topics in the CESAM project have been grouped in 9 different areas among which are re-flooding of degraded cores, pool scrubbing, hydrogen combustion, or spent fuel pool behaviour. Furthermore, modelling improvements have been implemented in the current ASTEC V2.1 series for the estimation of source term consequences in the environment and the prediction of plant status in emergency centres. Finally, ASTEC reference input decks have been created for all reactor types operated in Europe today as well as for spent fuel pools. These reference input decks generically describe plant types like PWR, VVER, PHWR, and BWR without defining proprietary data of a special plant and they account for the best recommendations from code developers and users. In addition, a generic input deck for a spent fuel pool was elaborated. These input decks can be used as basis by all (and especially new) ASTEC users in order to understand code basic requirements and model features and to implement the specificities of their own NPP type. Based on these generic inputs, benchmark calculations have been performed with other codes (such as MELCOR, MAAP, ATHLET-CD, COCOSYS…) with a focus on applicability of ASTEC models to currently implemented SAM measures. This article provides a final summary of the CESAM project. Therefore, an overview of the improved modelling capabilities of the recent ASTEC V2.1 version is given followed by the validation status of ASTEC V2.1 as concluded after CESAM. Further, plant applications performed by CESAM partners will be summarized with a special focus on simulation of SAM measures in various NPP types, and insights gained on SAM measures will be derived.

Regularized Gaussian Mixture Model based discretization for gene expression data association mining

Article

Full-text available

Oct 2013

Association rule has shown its usefulness in the gene expression data based disease diagnosis for its good interpretability. The large number of rules generated from the high dimensional gene expression data is one of the main challenges of its applications. In this work, we reveal that the discretization preprocessing is one of the reasons for the association rule number explosion problem. To alleviate this problem, a Regularized Gaussian Mixture Model (RGMM) is proposed to discretize the continuous gene expression data. RGMM explores both the complexity of the discretization model and the information loss of the discretization procedure, under the Minimal Description Length framework. Extensive experiments show the effectiveness of RGMM on real-life gene expression data sets.

Graphical Models for Associations Between Variables, Some of Which Are Qualitative and Some Quantitative

Article

Full-text available

Mar 1989
ANN STAT

We define and investigate classes of statistical models for the analysis of associations between variables, some of which are qualitative and some quantitative. In the cases where only one kind of variables is present, the models are well-known models for either contingency tables or covariance structures. We characterize the subclass of decomposable models where the statistical theory is especially simple. All models can be represented by a graph with one vertex for each variable. The vertices are possibly connected with arrows or lines corresponding to directional or symmetric associations being present. Pairs of vertices that are not connected are conditionally independent given some of the remaining variables according to specific rules.

A Multivariate Discretization Method for Learning Bayesian Networks from Mixed Data

Article

Full-text available

Jan 2013

In this paper we address the problem of discretization in the context of learning Bayesian networks (BNs) from data containing both continuous and discrete variables. We describe a new technique for multivariate discretization, whereby each continuous variable is discretized while taking into account its interaction with the other variables. The technique is based on the use of a Bayesian scoring metric that scores the discretization policy for a continuous variable given a BN structure and the observed data. Since the metric is relative to the BN structure currently being evaluated, the discretization of a variable needs to be dynamically adjusted as the BN structure changes.

Approximate Equal Frequency Discretization Method

Article

Full-text available

May 2009

Many algorithms for data mining and machine learning can only process discrete attributes. In order to use these algorithms when some attributes are numeric, the numeric attributes must be discretized. Because of the prevalent of normal distribution, an approximate equal frequency discretization method based on normal distribution is presented. The method is simple to implement. Computing complexity of this method is nearly linear with the size of dataset and can be applied to large size dataset. We compare this method with some other discretization methods on the UCI datasets. The experiment result shows that this unsupervised discretization method is effective and practicable

A Latent Variable Model for Multivariate Discretization

Conference Paper

Full-text available

Jan 1999

We describe a new method for multivariate discretization based on the use of a latent variable model. The method is proposed as a tool to extend the scope of applicability of machine learning that handle discrete variables only.

Mixtures of Truncated Exponentials in Hybrid Bayesian Networks

Conference Paper

Full-text available

Aug 2001

In this paper we propose the use of mixtures of truncated exponential (MTE) distributions in hybrid Bayesian networks. We study the properties of the MTE distribution and show how exact probability propagation can be carried out by means of a local computation algorithm. One feature of this model is that no restriction is made about the order among the variables either discrete or continuous. Computations are performed over a representation of probabilistic potentials based on probability trees, expanded to allow discrete and continuous variables simultaneously. Finally, a Markov chain Monte Carlo algorithm is described with the aim of dealing with complex networks.

Estimating the Dimension of a Model

Article

Jan 1978
ANN STAT

Gideon Schwarz

Evaluating Influence Diagrams

Conference Paper

Jun 1990

Ross D. Shachter

We develop an algorithm that can evaluate any well-formed influence diagram and determine the optimal policy for its decisions. Since the diagram can be analyzed directly, there is no need to construct other representations such as a decision tree. As a result, the analysis can be performed using the decision maker's perspective on the problem. Questions of sensitivity and the value of information are natural and easily posed. Modifications to the model suggested by such analyses can be made directly to the problem formulation, and then evaluated directly.

Maximum-likelihood estimation from incomplete data via the EM algorithm (with discussion)

Article