ArticlePDF Available

Multilevel Training of Binary Morphological Operators

Authors:

Abstract and Figures

The design of binary morphological operators that are translation-invariant and locally defined by a finite neighborhood window corresponds to the problem of designing Boolean functions. As in any supervised classification problem, morphological operators designed from training sample also suffer from overfitting. Large neighborhood tends to lead to performance degradation of the designed operator. This work proposes a multi-level design approach to deal with the issue of designing large neighborhood based operators. The main idea is inspired from stacked generalization (a multi-level classifier design approach) and consists in, at each training level, combining the outcomes of the previous level operators. The final operator is a multi-level operator that ultimately depends on a larger neighborhood than of the individual operators that have been combined. Experimental results show that two-level operators obtained by combining operators designed on subwindows of a large window consistently outperforms the single-level operators designed on the full window. They also show that iterating two-level operators is an effective multi-level approach to obtain better results.
Content may be subject to copyright.
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 1
Multi-level Training of
Binary Morphological Operators
Nina S. T. Hirata
Abstract—The design of binary morphological operators that are translation-invariant and locally defined by a finite neighborhood
window corresponds to the problem of designing Boolean functions. As in any supervised classification problem, morphological
operators designed from training sample also suffer from overfitting. Large neighborhood tends to lead to performance degradation
of the designed operator. This work proposes a multi-level design approach to deal with the issue of designing large neighborhood
based operators. The main idea is inspired from stacked generalization (a multi-level classifier design approach) and consists in, at
each training level, combining the outcomes of the previous level operators. The final operator is a multi-level operator that ultimately
depends on a larger neighborhood than of the individual operators that have been combined. Experimental results show that two-
level operators obtained by combining operators designed on subwindows of a large window consistently outperforms the single-level
operators designed on the full window. They also show that iterating two-level operators is an effective multi-level approach to obtain
better results.
Index Terms—Image processing, pattern recognition, machine learning, classifier design and evaluation, morphological operator,
Boolean function, image operator learning, multi-level training, stacked generalization.
F
1 INTRODUCTION
MOrphological operators are nonlinear signal and
image processing tools with applications in a vari-
ety of fields such as biological and biomedical image pro-
cessing, geoscience, remote sensing, industrial systems,
document processing, among others [1], [2], [3]. Many of
these operators such as erosions, dilations, openings and
closings, are parameterized by subsets called structuring
elements, used to locally probe the input images. The
output of the operator at each location depends on the
relationship between the structuring element and the
image [2], [3], [4], [5].
Morphological image operators are usually designed
on a trial and error basis by composing several sim-
pler operators, each one with an appropriate structur-
ing element. Design success depends on the expertise
of the designer. Another design approach consists of
procedures based on training techniques [6], [7], [8], [9],
[10], [11]. Pairs of training images, consisting of an image
before processing and its respective desired processing
result, are used to estimate the parameters of an operator.
Specifically, some representation for the operators is
assumed, and the training process adjusts the parameters
in order to find, among the operators that comply with
N. S. T. Hirata is with the Department of Computer Science, Institute of
Mathematics and Statistics, University of ao Paulo, Brazil.
E-mail: nina@ime.usp.br
c
c
2007 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any
current or future media, including reprinting/republishing this
material for advertising or promotional purposes, creating new
collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works.
the adopted representation, one that optimizes some
performance criterion. Several optimization algorithms
are used, such as linear programming [12], [13], genetic
algorithms [7], [14], [15], decision trees [16], adaptive
algorithms [17], among others.
In this work, the representation considered is the
canonical decomposition. According to the canonical de-
composition theorem, any translation-invariant operator
can be expressed uniquely as a supremum of a set
of interval operators [18]. By imposing local definition
by a neighborhood (expressed in terms of a window
W), it can be shown that all structuring elements of
the interval operators of a translation-invariant image
operator are subsets of W[19], [20]. These operators
are called W-operators and are locally characterized,
that is, the output at any location is determined by
a function that depends solely on the pixel values at
the neighborhood W. Hence, the problem of designing
aW-operator can be seen as a problem of designing
simple pattern classifiers (classifiers that map patterns
within Winto an appropriate output pixel value). In
binary morphology, W-operators are equivalent to the
class of Boolean functions on |W|(cardinality of set W)
variables [21], [22].
By considering a sufficiently large neighborhood,
it is possible, at least theoretically, to represent any
translation-invariant operator via W-operators [22]. In an
ideal setting, a sufficiently large window (say as large as
the largest objects in the images) should be considered
and training techniques could be used to obtain an
operator from training data. However, it is a well known
fact that due to overfitting the performance of learning
algorithms degrades as the dimension of the patterns
increases [23], [24]. This situation in the context of binary
This is the accepted version. Published version DOI: 10.1109/TPAMI.2008.118
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 2
0.01
0.015
0.02
0.025
0.03
0.035
20 40 60 80 100 120
MAE
Window size
5 training images
2 training images
1 training image
Fig. 1. The U-shape error curve indicates overfitting.
morphological image operator design is illustrated in
Fig. 1. Horizontal axis represents the window size (in
pixels) while the vertical one represents the mean abso-
lute error (MAE) computed on an independent test set.
Each curve corresponds to the mean error of ten training
runs, using different sets with the same number of
training images. The graph shows that, as the dimension
(window size) increases, the error initially drops very
fast, reaches a minimum point, and starts to increase
slowly. Such behavior, that results in a U-shaped error
curve, is irrespective to the amount of training data in
practical situations (finite sample). Standard deviation is
shown for the upper and lower curves.
Large error for small windows is due to their inability
to distinguish larger patterns while for the large win-
dows it is due to insufficiency of training data. Thus,
for a fixed amount of training data, simply increasing
window size does not work as a way to decrease the
error. A possible solution could be increasing the amount
of training data until satisfactory error rate is reached.
However, a linear increase in the amount of training
data does not result in a linear increase of performance.
Moreover, training data may not be readily available and
may require significant additional effort to be prepared
or even involve costs that can not be neglected. Another
issue to be considered is the fact that computational cost
tends to increase not only with window size, but also
with the amount of training data.
This work reports investigations concerned with ob-
taining, from a given fixed amount of training data, a
morphological image operator with better performance
than the one that corresponds to the optimal window
(minimum point) in the U-shaped error curve. The study
is restricted to binary morphological image operators.
The training algorithm used is Boolean function mini-
mization, described in [25]. It should be noted that while
this work addresses the design of translation-invariant
morphological operators that are locally defined within
a neighborhood window, most of the works cited above
are restricted to specific subclasses of these operators.
Attempts to deal with training data limitation include
use of prior knowledge such as knowledge on im-
age operator properties (for instance, anti-extensiveness
or increasingness [13]) and knowledge on the data
model [26], and iterative design techniques [9]. The use
of knowledge on properties of the desired operator may
lead however to overly complex training algorithms,
as is the case of stack filter design in which a large
number of constraints must be satisfied [12], [13]. On the
other hand, image model is seldom known and modeling
image data is not a trivial task. As for the iterative design
approach [9], which results in sequentially composed
operators, it does show improvement of the results with
relation to the single non-composed operators. In this
work a generalization of the iterative approach, namely
a multi-level training approach, inspired from classifier
combination techniques [27] is proposed. More specifi-
cally, the proposed model is based on stacked general-
ization, introduced by Wolpert [28]. The proposed model
is flexible enough to accommodate several multi-level
operator composition architectures.
Some preliminary results related to this approach have
been presented in [29]. This paper formalizes the idea
proposing a training model and discusses the relation
of the proposed model to some previously known ap-
proaches. In addition, new application examples are
presented.
The paper is organized as follows. In Section 2, some
definitions and notations are introduced, and the rela-
tionship between W-operators and Boolean functions is
recalled. Since the interest is in designing morphological
operators, the notion of optimal mean absolute error
(MAE) W-operators and a basic training methodology
used to estimate optimal MAE morphological operators
from training data are described. In Section 3, a multi-
level training approach is introduced by first considering
two-level training and then generalizing it to multiple
levels of training. Its relation to stacked generalization
as well as the fact that it generalizes the iterative design
approach are discussed in this section. In Section 4,
experimental settings and several results that show the
superiority of multi-level over a single-level operator
design, both in terms of error and computation time,
are presented. Experimental results show that the two-
level approach consistently gives better results than
single-level operators. This section also includes some
examples of multi-level training. Images from several
application examples are presented in the Appendix.
Section 5 presents some concluding remarks and future
research steps of this work.
2 BACKGROUND
This section initially recalls the equivalence between
translation-invariant binary morphological operators
and Boolean functions. This equivalence together with
some statistical assumptions allow us to model the prob-
lem of designing morphological operators as a prob-
lem of designing Boolean functions. A basic design
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 3
procedure, for mean absolute error minimization, first
described in [8] and improved in the subsequent years,
will be presented in order to provide a through overview
of the current state-of-the-art on this subject.
2.1 Binary W-operators
Let E=Z2. Binary images defined on Eare mappings
of the form f:E {0,1}. A binary image defined on E
can be equivalently represented by a subset SE. The
collection of all binary images defined on E(all subsets
of E) will be denoted by P(E). Binary images will be
represented by subsets S P(E)as they usually are
treated in mathematical morphology. When convenient,
the functional notation S(x) = 1 to indicate that x
belongs to S(xis a foreground pixel) and S(x)=0
on the contrary (xis a background pixel) will be used.
Elements of Eare denoted by lowercase letters such as x
and z. The translation of an image S P(E)by a vector
zis defined by Sz={x+z|xS}where +is the usual
vector addition in E.
A binary image operator is a mapping of the form
Ψ : P(E) P(E).Ψis translation-invariant if, for any S
P(E)and zE,[Ψ(S)]z= Ψ(Sz).Ψis locally defined with
respect to a non-empty window WEif zΨ(S)
zΨ(SWz)for any S P(E)and zE.
Operators Ψthat are both translation-invariant and
locally defined are called W-operators and can be charac-
terized by a local function ψ:P(W) {0,1}as follows:
zΨ(S) ψ(SzW)=1.(1)
By assigning a binary variable xito each point wiW,
and setting xi= 1 if and only if wiSzW, function
ψcan be seen as a Boolean function (BF) on n=|W|
variables.
This establishes the relationship between binary mor-
phological operators and BFs and reduces the problem
of designing binary W-operators to the problem of de-
signing BFs.
2.2 Mean Absolute Error Optimality
Let (S,I)denote a random process of jointly stationary
image processes Sand I, with realizations (S, I)where
Scorresponds to an observed image (i.e., an image to
be processed) and Icorresponds to the respective ideal
outcome (i.e., the respective desired processing result).
The mean absolute error (MAE) of a W-operator Ψ,
characterized by a BF ψ, with respect to process (S,I),
is defined as being the expected value of the absolute
difference between Ψ(S)and Iat an arbitrary location z,
i.e.,
MAEhΨi=E[|ψ(SzW)I(z)|].(2)
The process SzWis a random set, which can
be thought as a random vector Xzby associating each
element of SzWto a component of Xz. Similarly,
the value of a given pixel zin Ican be thought as a
realization of a random variable yz. Due to stationarity,
zmay be dropped from Xzand from yz, resulting in a
process (X,y)that locally characterizes (S,I). The joint
distribution of (X,y)will be denoted P(X,y).
With these assumptions, MAEhΨican be expressed as
MAEhΨi=E[|ψ(X)y|],(3)
with respect to the joint distribution P(X,y).
It can be shown that the optimal MAE operator [30]
is the one characterized by the function defined by, for
any XW,
ψ(X) =
1,if P(X, 1) > P (X, 0),
0,if P(X, 0) > P (X, 1),
0or 1if P(X, 0) = P(X, 1).
(4)
This equation defines a BF that can be straightforwardly
expressed in its canonical sum of products form.
2.3 Basic training method
Although the MAE optimal operator can be character-
ized in terms of input-output processes joint distribution,
such distribution is usually not known. Thus, a natural
solution is to estimate these probabilities from training
images and use in Eq. 4 the estimated probabilities
instead of the true ones.
Let (Si, Ii)denote pairs of training images and Wa
non-empty window. An example of a pair of training
images and a window are shown in Fig. 2. The training
(a)
x1
x2x3x4
x5
(b)
Fig. 2. Example of (a) a pair of training images (S, I), and
(b) a neighborhood window W. Image size is 64 ×64.xi,
i= 1,...,5, indicate the binary variables assigned to the
points of W.
methodology used in this work is composed of three
steps, described next. Each step is illustrated for the
example given in Fig. 2.
1) Slide Won each input image Siand record the
pair (SizW, Ii(z)) for each location z(except
those at which the window is not entirely inside
the image domain) and count he occurrence of each
pair (X, y) P (W)× {0,1}. This step yields an
estimate ˆ
P(X, y)of P(X, y). For the above example,
3844 (one pixel wide margins in all sides of the
image are not considered) pairs of the form (Siz
W, Ii(z)) are observed, resulting in the following
frequency table:
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 4
3502 0
16 0
17 0
17 0
16 0
40
40
40
40
06
0 6
06
06
012
013
012
013
186 0
XX
X000 111
y yy
2) For each pattern Xobserved in step 1, decide
ˆ
ψ(X) = 1 if ˆ
P(X, y = 1) >ˆ
P(X, y = 0) and
ˆ
ψ(X) = 0 otherwise. Were all possible patterns
have been observed, the BF corresponding to the
optimal MAE W-operator would be defined. In
practice, not all patterns are observed.
For the above example, the elements for which
ˆ
ψ(X) = 1 are
,,,,,, ,
{ }
and those for which ˆ
ψ(X) = 0 are
,,, , ,,,, ,
{ }
Other elements have not been observed in the
training image.
3) In order to obtain an operator that will be able
to classify patterns that have not been observed
during training, a generalization (or training) al-
gorithm must be applied. In this work, an algo-
rithm for the minimization of incompletely speci-
fied BFs [25] is used. The minimization procedure
results in a BF whose values for the observed
patterns are exactly as defined in step 2. The val-
ues for the non-observed patterns (don’t cares in
the terminology of switching functions [31]) are
determined by the minimization algorithm. Details
of the algorithm used in this work can be found
in [25].
After minimization, for the above example, the
resulting BF is ψ(x1, x2, x3, x4, x5) = x3x5+x3x4+
x3x2+x3x1. The product terms correspond to the
intervals below:
,,, ,
,
,,
{ }
h
hhh i
i
ii
Figure 3 shows the result obtained by applying the
trained operator on a test image.
From pattern recognition point of view, the design
process above can be seen as a procedure of determining
a binary classifier where the patterns are those observed
through Win the input images and the corresponding
class is the pixel value (y {0,1}) in the ideal output
image. Thus, although this work uses BF minimization
as a learning model, any other models such as neural
networks, support vector machines or others could be
used. Another comment is that step 2 of the procedure
described above could be simply ignored. By doing that,
Fig. 3. Application of the operator learned from examples
of Fig. 2: test image (left) and respective result (right).
one may have a set of examples with conflicting class
labels. So long as the learning algorithm is able to deal
with that, there is no further difficulties. The advantage
of using BF minimization is that the resulting product
terms correspond to interval operators and they can be
readily interpreted morphologically [21], [32].
Moreover, for the design of image operators, the space
of operators (class of classifiers) can be constrained based
on prior knowledge about properties of the desired
operator. For instance, in the case of binary operators,
a very useful constraint is to consider anti-extensive
operators, that is, operators such that Ψ(S)S. This
property guarantees that the resulting image is a subset
of the input image. In this case, in step 1 of the procedure
described above, one only needs to slide the window
over the foreground pixels of the input image and, in
step 3, the minimization procedure can safely suppose
ψ(X) = 0 for all elements X[, W \ {o}]. As a
consequence, the number of patterns considered in the
minimization process as well as the overall processing
cost may be significantly reduced. Another example are
the increasing operators, i.e., those such that S1S2
implies Ψ(S1)Ψ(S2). In this case, it can be shown
that the characteristic BFs are positive [30], [33].
As mentioned in the introduction, if we gradually in-
crease the window size, the operators designed on large
windows tend to present poorer performance in terms of
MAE than those designed on moderate size windows.
This is a common phenomenon known as overfitting
(also strongly related to the phenomenon known as
the curse of dimensionality). Therefore, designing im-
age operators based on the procedure described above
requires balancing window size and increase in MAE:
an arbitrarily large window results in larger MAE due
to generalization error while a too small window results
in larger MAE simply because no better MAE can be
achieved due to its inability to discriminate larger pat-
terns.
Finding the best window for a given fixed amount
of training data is still an open problem. Some works
have proposed heuristics to treat this problem [34]. In
the pattern recognition field, this corresponds to the
feature selection problem [35]. Even supposing we are
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 5
able to compute the best window for a given amount
of training data, there are situations in which such a
window is not large enough to reach an acceptable
MAE. In these cases, in order to reduce MAE, the only
possibility is to consider a larger space of operators,
one in which operators are based on larger windows. If
the above procedure is used with larger windows, MAE
will be larger due to generalization error (since we are
considering that the amount of training data remains
the same). Thus, the main question now is whether it
is possible to, using the same amount of training data,
design an image operator with smaller empirical MAE.
3 MU LTI-LEVEL TRAINING OF MORPHOLOGI-
CAL OPE RATORS
The idea of multi-level training is inspired from clas-
sifier combination approaches. Distinct operators based
on different windows are designed and then another
operator that combines their outcomes is designed. The
main question is whether the combination is able to pro-
duce better results than any of the individual operators
and also of the operator trained with the window that
corresponds to the union of all individual windows.
3.1 The proposed model
To introduce notations, we start by explaining how a
two-level approach could be modeled. In the first level,
n1operators, each one based on its own window Wi,
are trained. In the second level, outcomes of each of
the first level operators are combined to train a level-2
operator. The resulting operator is a two-level operator.
An example is shown in Fig. 4. Three sub-windows W1,
W2and W3are considered, and three level-1 operators,
denoted respectively Ψ(1)
1,Ψ(1)
2, and Ψ(1)
3, are designed
based on each of the sub-windows. Realizations of the
output image process Ψ(1)
i(S),i= 1,2,3, are the images
that will be used for the training of the level-2 operator.
Notice that, in this particular case, the union of the three
sub-windows is equal to a larger window W. Therefore,
by combining the outputs of Ψ(1)
1,Ψ(1)
2, and Ψ(1)
3, the
level-2 operator Ψ(2) indirectly makes use of information
under Win the input image process S.
In the schema shown in Fig. 4, exactly one pixel
and the one at the same location of the target pixel is
taken from each level-1 operator output in the second
level training. However, in a more general setting, more
than one pixel of each of the level-1 outcomes could be
considered. Moreover, some pixels of the original input
Scould also be considered in the level-2 operators. These
considerations lead to a general multi-level training ap-
proach described next.
3.1.1 Model flexibility
In general, input to an operator at any level may
come from the outputs of any previous levels, including
the initial input data. Therefore, we denote by nlthe
W1W2W3
Ψ(1)
1Ψ(1)
2Ψ(1)
3
Ψ(2)
S
Ψ(1)
i(S)
Ψ(2)(S)
Fig. 4. A two-level operator. Level-1 operators Ψ(1)
1,Ψ(1)
2
and Ψ(1)
3are based on windows W1,W2and W3, respec-
tively. Although depicted in separate, input image to all
three level-1 operators is the same; what differs is the
neighborhood taken as an input (shown in dark gray) by
each operator. Level-2 operator Ψ(2) takes as input three
pixels, one from each output Ψ(1)
i(S),i= 1,2,3(shown in
dark gray), of the results of the level-1 operators.
number of operators and by Ψ(l)
ithe i-th operator at
level l,i= 1,2, . . . , nl. The input of Ψ(l)
iis defined
by a set of windows W(l)
i={W(l)
i(S(t)
j)|0t <
l, and, for each t, 1jnt}. A window W(l)
i(S(t)
j)
indicates that a neighborhood of the j-th output of level
tis part of the input of Ψ(l)
i. Output of level 0as in
W(l)
i(S(0)
1)corresponds to the original input image.
Figure 5 shows a diagram representation of a general
three-level operator. At level 1, there are 3operators
Ψ(1)
1,Ψ(1)
2and Ψ(1)
3. They are based respectively on
windows W(1)
1(S(0)
1),W(1)
2(S(0)
1)and W(1)
3(S(0)
1). Win-
dow W(1)
1(S(0)
1)indicates that part of the input of Ψ(1)
1
comes from output 1of level 0(that means the original
input data S(0)
1). At the second level, there are two
operators, Ψ(2)
1and Ψ(2)
2. The former is assigned to four
windows, namely W(2)
1(S(0)
1),W(2)
1(S(1)
1),W(2)
1(S(1)
2)and
W(2)
1(S(1)
3). That means that part of the input data comes
from S(0)
1, other part from S(1)
1, other from S(1)
2and
other from S(1)
3. The input size of Ψ(2)
1is given by
|W(2)
1(S(0)
1)|+|W(2)
1(S(1)
1)|+|W(2)
1(S(1)
2)|+|W(2)
1(S(1)
3)|.
At the third level, parts of every output of the previous
levels plus the original input can be taken as input. The
window size of an operator Ψ(l)
iis given by
l1
X
k=0
nl
X
j=1
|W(l)
i(S(k)
j)|.(5)
From a practical point of view, most of the windows
involved in the schema should be empty or very small.
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 6
S(0)
1
S(1)
1S(1)
2S(1)
3
S(2)
1S(2)
2
S(3)
1
Ψ(1)
1Ψ(1)
2Ψ(1)
3
Ψ(2)
1Ψ(2)
2
Ψ(3)
1
W(1)
1(S(0)
1)
W(1)
2(S(0)
1)
W(1)
3(S(0)
1)
W(2)
1(S(0)
1)
W(2)
2(S(0)
1)
W(3)
1(S(0)
1)
W(3)
1(S(1)
1)
W(3)
1(S(1)
3)
W(2)
1(S(1)
1)
W(2)
2(S(1)
1)
W(2)
1(S(1)
3)
W(2)
2(S(1)
3)
W(3)
1(S(1)
2)
W(3)
1(S(2)
1)
W(3)
1(S(2)
2)
Fig. 5. A schema for three-level learning of image op-
erators: one input process, three level-1 operators, two
level-2 operators, and one level-3 operator. Output is the
process represented by S(3)
1.
Fig. 6. The result of the operator at any target pixel
depends on a neighborhood larger than the ones defined
by individual windows.
Otherwise, a large pattern would be formed to train su-
perior level operators and that may result in overfitting.
3.1.2 Window support
Since multi-level operators are sequential compositions
of operators, they are based on windows usually larger
than the windows of their components. To understand
how large is the window of the final operator, recall that
the Minkowski sum of two sets Aand Bis given by
AB={x+y|xAand yB}.(6)
To simplify notation, a one-dimensional domain ex-
ample shown in Fig. 6 is considered. Suppose the target
location is x=x0. There are three level-1 operators,
characterized respectively by BF ψ(1)
1,ψ(1)
2, and ψ(1)
3.
Their windows, W(1)
1,W(1)
2, and W(1)
3are all 3-point,
with origin at the rightmost, center and leftmost pixels,
respectively. The second level operator Ψ(2) takes one
PSfrag
ψ(1)
1
ψ(1)
2
ψ(1)
3
ψ(2)
S
S
S
Ψ(1)
1(S)
Ψ(1)
2(S)
Ψ(1)
3(S)Output
x
x
x
xy1
y2
y3
Fig. 7. The result of the operator at any target pixel
depends on a neighborhood larger than the ones defined
by individual windows.
pixel of each of the level-1 operator’s outcome. Thus,
we have Ψ=Ψ(2) (1)
1,Ψ(1)
2,Ψ(1)
3), i.e.,
[Ψ(S)](x) =
=ψ(2)(1)
1(S)](x),(1)
2(S)](x),(1)
3(S)](x)
=ψ(2)y1, y2, y3(7)
where y1=ψ(1)
1(S(x2), S(x1), S (x0)),
y2=ψ(1)
2(S(x1), S(x0), S (x1)), and y3=
ψ(1)
3(S(x0), S(x1), S (x2)). Therefore, the result of the
final operator, at location x, depends on the set of pixels
of Sat {x2, x1, x0, x1, x2}(pixel xplus two neighbor
pixels at both the left and the right side of x), which can
be expressed via Minkowski sum as ({x2, x1, x0}
{0})({x1, x0, x1}⊕{0})({x0, x1, x2}⊕{0}). Thus,
although each level-1 operator depends on a 3-point
window, and the second operator depends on one pixel
of each level-1 operator’s outcomes, the composed final
operator Ψdepends (indirectly) on a 5-point window.
When the level-2 operator considers pixels beyond
those at the target location, a larger window dependence
results. In Fig. 7, an example similar to the one in Fig. 6
is shown. The level-2 operator Ψ(2) takes as input the
pixels at x1from S(1)
1, at x0from S(1)
2, and at x1from
S(1)
3. Dependence can be expressed as
[Ψ(S)](x) =
=ψ(2)(1)
1(S)](x1),(1)
2(S)](x0),(1)
3(S)](x1)
=ψ(2)y1, y2, y3(8)
where y1=ψ(1)
1(S(x3), S(x2), S (x1)),
y2=ψ(1)
2(S(x1), S(x0), S (x1)), and y3=
ψ(1)
3(S(x1), S(x2), S (x3)). In this case, the
dependence is given by a 7-point neighborhood
{x3, x2, x1, x0, x1, x2, x3}, which can be expressed
as ({x2, x1, x0} {−1})({x1, x0, x1} {0})
({x0, x1, x2}⊕{1}).
We define as a window support of an operator with
respect to an input process the input image pixel neigh-
borhood that is taken in consideration by the operator
(directly or indirectly) to produce the output pixel value
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 7
at any arbitrary location. Supposing only one input
process S(0)
1, level -1 operators depend solely on initial
input data S(0)
1. Thus, the window support of any level-
1 operator Ψ(1)
iis the window W(1)
i=W(1)
i(S(0)
1). For
operators of level-2, input may come from outputs of
level-1 operators as well as from the initial input S(0)
1.
Thus the window support of a level-2 operator Ψ(2)
iis
W(2)
i=n1
j=1 (W(1)
jW(2)
i(S(1)
j)W(2)
i(S(0)
1).
In the following, we present a recurrent formula that
describes the window support W(l)
iof the i-th operator at
level l, with respect to input process S(0)
a,a= 1, . . . , n0:
W(1)
i=W(1)
i(S(0)
a), i = 1,2, . . . , n1
W(l)
i=W(l)
i(S(0)
a)
l1
[
k=1
nk
[
j=1 W(k)
jW(l)
i(S(k)
j)i (9)
for any l1and 1inl.
As an example, let us compute the window support
of the operator given by the architecture shown in
Fig 7. We know that W(1)
j=W(1)
j(S(0)
1). More precisely,
W(1)
1(S(0)
1) = {−2,1,0}(origin at the right extreme),
W(1)
2(S(0)
1) = {−1,0,+1}(origin at the center) and
W(1)
3(S(0)
1) = {0,+1,+2}(origin at the left extreme).
Thus, since l= 2 and n1= 3,
W(2)
1=W(2)
1(S(0)
1)
1
[
k=1
nk
[
j=1 W(k)
jW(2)
i(S(k)
j)
=W(2)
1(S(0)
1)
3
[
j=1 W(1)
jW(2)
1(S(1)
j)
=W(2)
1(S(0)
1)W(1)
1W(2)
1(S(1)
1)
W(1)
2W(2)
1(S(1)
2)W(1)
3W(2)
1(S(1)
3)
= ({−2,1,0} {−1})
({−1,0,+1}⊕{0})({0,+1,+2}⊕{+1})
={−3,2,1} {−1,0,+1}∪{+1,+2,+3}
={−3,2,1,0,+1,+2,+3}
which is the 7-point window centered at the origin.
3.2 Relation to Stacked Generalization
In the field of machine learning, a multi-level training
approach known as stacked generalization was proposed
by Wolpert [28]. Multiple levels of training are per-
formed, such that at the initial level some classifiers
are obtained from training data and at other levels they
are obtained from the outputs of the classifiers of the
previous levels, eventually together with part of the orig-
inal input data. While usual approaches consider simple
combination strategies like majority vote, in stacked
generalization the proposal is to perform another level of
training, precisely to learn how to combine the outcomes
of the classifiers of the previous levels.
Despite of the similarities of the multi-level design
proposed here with stacked generalization, one differ-
ence should be pointed. Classifiers usually generate only
one output, namely the class to be assigned to the input
pattern. Therefore, when combining the outputs of the
previous levels to form the patterns for the current level,
at most one outcome from each classifier of the previous
levels can be taken. With images, since the outcome
of a previous level operator is an image, information
from neighboring pixels can also be taken and thus
more richer combinations are possible (at the expense
of generating large dimension patterns).
3.3 Relation to iterative design
In iterative training [9], a sequence of operators that
aim to successively refine the previous result are de-
signed as follows. Suppose the initial training data set
is given by pairs of images in the form (Si, Ii)(which
are realizations of random processes S(0) and I). In the
first level of training, a W(1)-operator Ψ(1) is obtained
in such a way as to minimize E(1)(S(0) ),I], the MAE
between the transformed image and its respective ideal
image. In the second level of training, pairs of the form
(1)(Sj), Ij)are considered for training. After k-levels
of training, the final operator consists of the composition
Ψ(S)=Ψ(k)(k1) (· · · (2)(1)(S))) · · · )).
This is a particular case of the proposed schema, with
only one operator per level. For three levels, the general
model shown in Fig. 5 reduces to the one shown in Fig. 8.
S(0) S(1) S(2) S(3)
Ψ(1) Ψ(2) Ψ(3)
W(1) W(2) W(3)
Fig. 8. Example of the specialization of the general
schema of Fig. 5 to the iterative case. Only one operator
per level.
Considering that
W(1) =W(1)
and since nl= 1 for all l, window support of any
operator Ψ(l)reduces to
W(l)=W(l1) W(l)(S(l1))(10)
Expanding it, we have that W(k)=W(1) W(2)
. . . W(k). Notice that the union Sl1
k=1 in the origi-
nal recurrent formula becomes redundant here because
W(k)W(k+1) for any k.
4 MO DEL EVALUATIO N
The model presented in the previous section is flexible
enough to allow several variations in the training archi-
tecture, namely
levels of training,
number of classifiers in each level, and
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 8
inputs of each classifier (window sizes and respec-
tive training images).
Some of the possibilities have been experimentally
evaluated and the main results are reported in this
section. In this work, the number of input process has
been fixed to one (i.e., n0= 1).
4.1 Simple two-level operators
In order to check whether combination (i.e., two-level
operators) provides advantages over single-level oper-
ators, several experiments with distinct data sets have
been carried out. The training architectures used in these
experiments are shown in Fig. 9. For each experiment,
S(0)
S(0)
S(1)
1S(1)
2S(1)
n1
S(2)
1
S(1)
Ψ(1)
1Ψ(1)
2Ψ(1)
n1
Ψ(2)
1
Ψ(1)
W
W(1)
1
W(1)
2
W(1)
n1
W(2)
1(S(1)
1)
W(2)
1(S(1)
2)
W(2)
1(S(1)
n1)
Fig. 9. Basic training architecture: one-level operator
(left) and two-level operator, with n1level-1 operators
Ψ(1)
1,...,Ψ(1)
n1(right).
a window Wand n1subwindows W(1)
1, . . . , W (1)
n1of W
has been selected. A window Wknown from previous
experiences to be one of that yields good results for
the single level operator has been chosen whenever
such information were available. A single-level operator
has been designed with respect to W. The n1level-1
operators of the two-level operator have been designed
with respect to subwindows W(1)
iand then combined
by the level-2 operator by taking one pixel from each of
the outcomes of level-1 operators (i.e., W(2)
1(S(1)
i) = {o},
i= 1, . . . , n1).
For each data set, a fixed number of pairs of training
images and an independent set of test images have been
considered. For the training of single-level operators,
all training images were used, while for the training of
the two-level operators one part has been used for the
level-1 operators and the remainder for the second-level
operators. The same test images have been used for both
cases.
Table 1 describes the data sets used in these exper-
iments. Its first column presents a brief description of
the data set, in terms of the processing task. The second
column presents the total number of training images
followed by how they were distributed in the two-levels
of training. For instance, in the first row, 8(5 : 3) means
that a total of 8images were used for the training of
the one-level operator while for the two-level operator
5images were used in the level-1 and 3in the second-
level training. The third column of the table indicates
how many images were used for testing. Images used
are not necessarily of the same size, but all images in a
data set has been obtained from a common context using
a common acquisition procedure (scanning parameters,
thresholding parameters, etc).
TABLE 1
Data sets used for training and estimation of MAE of
two-level operators.
Description # Training # Test
images images
A. Functional diagrams 8 (5:3) 10
(circular object segmentation)
A0. Functional diagrams 8 (5:3) 10
(dashed box segmentation)
A00. Functional diagrams 8 (5:3) 10
(character segmentation)
B. Texture segmentation 3 (2:1) 2
C. Character segmentation 10 (6:4) 10
D. Text segmentation 5 (3:2) 5
(magazine pages)
E. Text segmentation (book pages) 5 (3:2) 5
F. Boolean noise filtering 5 (3:2) 5
A summary of the results obtained for the different
data sets is presented in Table 2. Each experiment is
identified according to the respective data set used. For
instance, Experiments C1and C2refer to data set C,
and indices 1and 2indicate experiments with distinct
window W. Column |W|indicates the size of W, while
column n1indicates the number of level-1 operators
used in the two-level operator. Training time is given
in seconds, MAE corresponds to the empirical MAE on
the test images (relative number of pixels in the absolute
difference between the operator result and the expected
ideal image, averaged on the total of test images). Values
missing in the training time and MAE fields of the one-
level operators indicate that their training time have
far exceeded the training time of the corresponding
two-level operators when their execution were aborted.
Subwindows used in each of the experiments are shown
in Fig. 10.
With the exception of Experiment D1, for the cases
in which both single and two-level operators have been
designed, the latter present better performance both in
terms of MAE and training time. Experiment D1com-
pared to Experiment D2indicates that, if the subwin-
dows are not large enough, two-level operators do not
have better performance than single-level ones in terms
of MAE. By using larger subwindows (Experiment D2),
two-level operator with better MAE is obtained, while
training time for the single operator for the correspond-
ing support window showed to be prohibitively large.
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 9
TABLE 2
Training time (in seconds) and empirical MAE for different experiments (MAE averaged on the total of test images).
Training time based on a CPU AMD Athlon 64 X2 4200 2.2GHz, with 3GB RAM.
One-level Two-level
Exp. |W|n1#train. pixels #test pixels Train. time MAE Train. time MAE
A 9 ×96 85967 102883 2701 0.015 1288 0.008
A025 ×25 6 85940 102883 - - 420 0.025
A00 9×96 85967 102883 12615 0.06 6883 0.04
B 9 ×98 230533 104399 677476 0.07 169 0.04
C19×75 193445 197458 207 0.009 195 0.006
C211 ×95 193319 197474 33190 0.010 839 0.004
D19×75 1049540 783834 45518 0.040 20888 0.046
D211 ×11 7 1047219 783834 - - 62118 0.031
E 11 ×11 7 176368 493755 - - 48760 0.004
F 9 ×95 1270080 1260020 9828 0.006 662 0.003
A,A00 C1,D1
A0C2
D2,E
B F
Fig. 10. Subwindows (shown in black circles) used in the level-1 operators in experiments described in Table 2. Square
at the center indicates the origin.
Some test images and respective one-level and two-
level operator results are shown in the appendix.
To understand how level-2 operators combine their
input data, experiment D2has been examined. The
corresponding BF is composed of sixteen product terms
(intervals) in the form [A, 1111111],A {0,1}7, where
each of the seven components comes from the output
of the level-1 operators. The extremities Aare: 1111100,
1111001, 1110101, 1101101, 1101101, 0111101, 1110010,
1100011, 1010110, 1001011, 0111010, 0110110, 0101110,
0101011, 0011110, 0010111. Careful analysis show that,
a pixel to be classified as 1in the output must receive
at least four votes from level-1 operators. However this
condition is not sufficient. There are cases in which five
votes are necessary. A curious fact is that the second
operator in the first level seems to be a key element in
determining when four or five votes are necessary. The
effect of the level-1 operators can be seen in Fig. 11.
4.2 Other architectures
According to the model, to combine previous level
operators, more than one pixel values from each of
the previous level results can be taken. The examples
presented above may give a false impression that taking
only one pixel from each image is the way combination
is done. The following example shows a situation in
which taking two pixels instead only one from each
input image results in better performance.
Experiment Bhas been repeated taking two pixels,
instead only one, from each output of the eight level-
1 operators. The two-point windows W(2)
1(S(1)
i),i=
1,2,...,8, are shown in Fig. 12. This experiment was
repeated nine times, considering different partitions of
the five images into subsets of three training images
and two test images. Training considering one pixel
from each input image in the second level resulted in
operators with performance 30% worse in average than
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 10
Fig. 11. Experiment D2: input image, detail of the su-
perposition of the level-1 operator results (darkness of
the pixel indicates the number of operators that assigned
output 1to that pixel), and resulting output images, re-
spectively.
Fig. 12. Training architecture variation: windows used
in the second level to combine the eight level-1 operator
outputs.
the corresponding operators that considered two pixels
from each input image. The smallest difference against
the one pixel operator was 7.2% while the largest reached
118%.
Another variation that has been tested is iterative
training as described in Section 3.3. In general, first
iteration produces a significant MAE drop and then
decrease in MAE tend to diminish after the second
iteration, starting to oscillate or even increase slightly
as the number of iteration increases. Figure 13 shows an
example of a sequence of images obtained by increasing
the number of iterations, using windows of size 9×9,
5×9,5×3and 3×3, respectively. It can be seen that
at each iteration the resulting image approximates the
ideal one. In general, the quality of convergence depends
on the amount of training data and the sequence of
windows used. See more details, for example, in [9]. A
typical MAE curve as the number of iterations increases
is shown in Fig. 14. Bold line is the average MAE and
light gray lines correspond to the MAE with respect to
10 test images for the above example.
As another way to explore the possibility of variations
in the training architecture provided by the model, rather
than simply iterating operators sequentially, it is possible
to iterate two-level operators. A concrete example of
such an architecture is shown in Fig. 15. This is a four-
level operator that can also be understood as a two-level
iteration of two-level operators. The window support
Fig. 13. From top to bottom and left to right: input and
ideal image, followed by the outputs of four iterations with
windows of size 9×9,5×9,5×3and 3×3, respectively.
1234
0.005 0.010 0.015 0.020 0.025
Iteration against MAE −− 10 test images
Iteration
MAE
Fig. 14. MAE evolution through iterations: average MAE
(in bold) and MAE with respect to 10 test images for the
example illustrated in Fig. 13.
of level-4 operator Ψ(4) is (9 ×9) (9 ×9). In order
to compare its performance with equivalent window
support operator, iterative design with two-levels, both
levels of iteration on window 9×9has been done.
Table 3 shows the performance of these operators. This
experiment was repeated three times, using different
distribution of the training data between the levels. In
the three cases, the result obtained was consistent with
the one presented in Table 3, that is, iteration of two-level
operators performed better than iteration of single-level
operators.
5 CONCLUDING REMARKS
A model for multi-level training of large neighborhood
window based morphological operators has been pro-
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 11
S(0)
S(1)
1S(1)
2S(1)
3S(1)
4S(1)
5S(1)
6
S(2)
1
S(3)
1S(3)
2S(3)
3S(3)
4S(3)
5
S(4)
1
Ψ(1)
1Ψ(1)
2Ψ(1)
3Ψ(1)
4Ψ(1)
5Ψ(1)
6
Ψ(2)
1
Ψ(3)
1Ψ(3)
2Ψ(3)
3Ψ(3)
4Ψ(3)
5
Ψ(5)
1
Fig. 15. Four-level architecture: n1= 6,n2= 1,n3=
5, and n4= 1. Windows of the level-1 operators are the
same of Experiment A; level-2 operator takes one pixel
from each of the level-1 outcomes; windows of the level-3
operators are rectangles of size 5×9or 9×5within the
9×9; level-4 operator takes one pixel from each of the
level-3 outcomes.
TABLE 3
Performance of a four-level (Ψ) and of a two-iteration
(Φ=Φ(2) Φ(1)) operators, both with window support
17 ×17 on data set A(8 training and 10 test images).
Operator Error pixels
Φ(1) 1760
Φ=Φ(2) Φ(1) 628
Ψ(1)
i, i = 1,2,...,6 2110 ±110
Ψ(2) 677
Ψ(3)
i, i = 1,2,...,5 409 ±30
Ψ(4) 294
posed. Experimental results show that two-level opera-
tors consistently outperforms single-level operators both
in terms of MAE and processing time. They also show
that multi-level training, by iterating two-level operators,
is an effective way of obtaining better results than usual
iterative design techniques.
In order to understand why combining several smaller
window operators results in better performance than just
training an operator on a large window, it is convenient
to look back to the U-shaped error curve presented in
the introduction of this work. According to the typ-
ical behavior, those error curves present a relatively
flat minimum region, corresponding to the windows of
operators with best error performance in test images. If
one uses these windows for the level-1 operators, then
performance similar to the best obtained by a single-level
operator is guaranteed because one could just choose
the level-1 operator with best performance as the output
of the level-2 operator. Thus, it is reasonable to expect
that, instead of just choosing the output of one of the
level-1 operators, if one decides the output based on a
second level of training (from the outputs of the level-1
operators), results should be no worse.
In all experiments reported in this work, a subset of
windows for the level-1 operators that resulted in two-
level operators with improved performance were found
without great effort. The effectiveness of two-level and
iterated two-level operators shows the usefulness of the
proposed model and justifies further investigations with
respect to issues related to the choice of the training
architecture, namely the choices concerning the number
of levels of training, number of operators in each level
and their respective windows and training data. With
regard to the choice of windows, so far experimental
results show that taking just one pixel value of each of
the previous level operator results is often a good choice.
Although there exists cases in which taking more pixel
values is better, too many pixels should not be consid-
ered, mainly when the amount of training data is limited,
because that may lead to overfitting. In practice, there
seems to be a tradeoff between the number of pixels
to be taken from each of the outputs of the previous
levels and the number of operators to be combined.
With respect to the number of levels, in the iterative
design approach one should stop iterating as soon as
the error stops decreasing. However, that may depend
on the windows used in each iteration. In the current
state, fully or partially automating choices concerning
all these issues may be considered a real challenge.
Another issue to be investigated is if additional input
processes that do not necessarily reflect geometry and
topology (shape features that are captured by windows)
but other features like color, texture, or even geometri-
cal and topological features not easily captured by the
windows (such as area, size, presence or absence of
holes in the component, etc) could be able to improve
performance of the multi-level operator.
The proposed model may be used to process high-
resolution images by considering windows that have the
effect of sub-sampling the images in different ways. It is
one of our next aims to relate the proposed approach to
the multi-resolution design of morphological operators.
Extension of the proposed approach for the design of
gray-scale morphological operators is also possible. In
the gray-scale case the effects of overfitting are more crit-
ical than in the binary case. Thus, it is expected that good
performance improvement would be achieved by multi-
level operators. However, designing gray-scale operators
is computationally hard and demands a larger amount
of training data. Thus, before tackling multi-level design,
single-level design needs to be better experimented.
Finally, from a more formal point of view, this ap-
proach and others that consider combination of oper-
ators may be framed in the context of function decom-
position. Given a discrete function, decomposing it as
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 12
a composition of functions that depend on a smaller
number of inputs each is a classical problem. In the
context of morphological operator design, an interesting
question is to find out which are the operators that can be
obtained by a given training architecture or, conversely,
given a class of image operators to find out if there is a
multi-level training architecture that corresponds to that
class.
APPENDIX
RESULTS FOR TEST IMAGES
Test images and respective results obtained by the de-
signed operators are presented for some of the experi-
ments described in Section 4 (Figures 16 to 21.) These
and additional images can be find in the web site
http://www.vision.ime.usp.br/nonlinear/multilevel.
ACK NOWLEDGMENTS
This work has been supported by FAPESP through pro-
cess 2004/11586-7. N. S. T. Hirata is partially supported
by CNPq, Brazil, under grant 312482/2006-0.
REFERENCES
[1] G. Matheron, Random Sets and Integral Geometry. John Wiley, 1975.
[2] J. Serra, Image Analysis and Mathematical Morphology. Academic
Press, 1982.
[3] P. Soille, Morphological Image Analysis, 2nd ed. Berlin: Springer-
Verlag, 2003.
[4] R. M. Haralick, S. R. Sternberg, and X. Zhuang, “Image Analysis
Using Mathematical Morphology,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. PAMI-9, no. 4, pp. 532–550,
July 1987.
[5] E. R. Dougherty and R. A. Lotufo, Hands-on Morphological Image
Processing. SPIE Press, 2003.
[6] I. T˘
abus¸, D. Petrescu, and M. Gabbouj, “A training Framework
for Stack and Boolean Filtering Fast Optimal Design Procedures
and Robustness Case Study,” IEEE Transactions on Image Processing,
vol. 5, no. 6, pp. 809–826, June 1996.
[7] N. R. Harvey and S. Marshall, “The Use of Genetic Algorithms in
Morphological Filter Design,” Signal Processing: Image Communica-
tion, vol. 8, no. 1, pp. 55–71, January 1996.
[8] J. Barrera, E. R. Dougherty, and N. S. Tomita, “Automatic Program-
ming of Binary Morphological Machines by Design of Statistically
Optimal Operators in the Context of Computational Learning
Theory,” Electronic Imaging, vol. 6, no. 1, pp. 54–67, January 1997.
[9] N. S. T. Hirata, E. R. Dougherty, and J. Barrera, “Iterative Design
of Morphological Binary Image Operators,” Optical Engineering,
vol. 39, no. 12, pp. 3106–3123, December 2000.
[10] R. Hirata Jr., M. Brun, J. Barrera, and E. R. Dougherty, “Mul-
tiresolution Design of Aperture Operators,” Journal of Mathematical
Imaging and Vision, vol. 6, no. 3, pp. 199–222, 2002.
[11] J. Yoo, K. L. Fong, J.-J. Huang, E. J. Coyle, and G. B. Adams III,
“A Fast Algorithm for Designing Stack Filters,” IEEE Transactions
on Image Processing, vol. 8, no. 8, pp. 1014–1028, August 1999.
[12] E. J. Coyle and J.-H. Lin, “Stack Filters and the Mean Absolute
Error Criterion,” IEEE Transactions on Acoustics, Speech and Signal
Processing, vol. 36, no. 8, pp. 1244–1254, August 1988.
[13] D. Dellamonica Jr., P. J. S. Silva, C. Humes Jr., N. S. T. Hirata,
and J. Barrera, “An Exact Algorithm for Optimal MAE Stack Filter
Design,” IEEE Transactions on Image Processing, vol. 16, no. 2, pp.
453–462, 2007.
[14] I. Yoda, K. Yamamoto, and H. Yamada, “Automatic Acquisition
of Hierarchical Mathematical Morphology Procedures by Genetic
Algorithms,” Image and Vision Computing, vol. 17, no. 10, pp. 749–
760, August 1999.
[15] M. I. Quintana, R. Poli, and E. Claridge, “Morphological al-
gorithm design for binary images using genetic programming,”
Genetic Programming and Evolvable Machines, vol. 7, no. 1, pp. 81–
102, 2006.
[16] R. Hirata Jr., E. R. Dougherty, and J. Barrera, “Aperture Filters,”
Signal Processing, vol. 80, no. 4, pp. 697–721, April 2000.
[17] P. Salembier, “Structuring element adaptation for morphological
filters,” Visual Communication and Image Representation, vol. 3, no. 2,
pp. 115–136, 1992.
[18] G. J. F. Banon and J. Barrera, “Decomposition of Mappings
between Complete Lattices by Mathematical Morphology, Part I.
General Lattices,” Signal Processing, vol. 30, pp. 299–327, 1993.
[19] H. J. A. M. Heijmans, Morphological Image Operators. Boston:
Academic Press, 1994.
[20] J. Barrera, R. Terada, R. Hirata Jr, and N. S. T. Hirata, “Automatic
Programming of Morphological Machines by PAC Learning,” Fun-
damenta Informaticae, vol. 41, no. 1-2, pp. 229–258, January 2000.
[21] G. J. F. Banon and J. Barrera, “Minimal Representations for
Translation-Invariant Set Mappings by Mathematical Morphol-
ogy,” SIAM J. Applied Mathematics, vol. 51, no. 6, pp. 1782–1798,
December 1991.
[22] J. Barrera and G. P. Salas, “Set Operations on Closed Intervals and
Their Applications to the Automatic Programming of Morpholog-
ical Machines,” Electronic Imaging, vol. 5, no. 3, pp. 335–352, July
1996.
[23] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning. Springer-Verlag, 2001.
[24] T. M. Mitchell, Machine Learning, ser. McGraw-Hill Series in
Computer Science. McGraw-Hill, 1997.
[25] N. S. T. Hirata, J. Barrera, R. Terada, and E. R. Dougherty, “The
Incremental Splitting of Intervals Algorithm for the Design of
Binary Image Operators,” in Proceedings of the 6th International
Symposium: ISMM 2002, H. Talbot and R. Beare, Eds., 2002, pp.
219–228.
[26] E. R. Dougherty and J. Barrera, “Prior information in the design of
optimal binary filters,” in International Symposium on Mathematical
Morphology, ser. Mathematical Morphology and its applications to
Image and Signal Processing, 1998, pp. 259–266.
[27] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algo-
rithms. Wiley, 2004.
[28] D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5,
pp. 241–259, 1992.
[29] N. S. T. Hirata, “Binary image operator design based on stacked
generalization,” in Proceedings of the SIBGRAPI 2005, A. C. Frery
and M. A. F. Rodrigues, Eds., 2005, pp. 63–70.
[30] E. R. Dougherty, “Optimal Mean-Square N-Observation Digital
Morphological Filters I. Optimal Binary Filters,” CVGIP: Image
Understanding, vol. 55, no. 1, pp. 36–54, January 1992.
[31] F. J. Hill and G. R. Peterson, Computer Aided Logical Design with
Emphasis on VLSI, 4th ed. John Wiley & Sons, 1993.
[32] E. R. Dougherty and J. Barrera, “Logical Image Operators,” in
Nonlinear Filters for Image Processing, E. R. Dougherty and J. T.
Astola, Eds. Bellingham: SPIE and IEEE Press, 1999, pp. 1–60.
[33] N. S. T. Hirata, E. R. Dougherty, and J. Barrera, “A Switching
Algorithm for Design of Optimal Increasing Binary Filters Over
Large Windows,” Pattern Recognition, vol. 33, no. 6, pp. 1059–1081,
June 2000.
[34] D. C. Martins Jr., R. M. Cesar Jr., and J. Barrera, “W-operator
window design by minimization of mean conditional entropy,”
Pattern Analysis and Applications, vol. 9, pp. 139–153, 2006.
[35] A. Jain and D. Zongker, “Feature Selection: Evaluation, Ap-
plication, and Small Sample Performance,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153–158,
February 1997.
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 13
Fig. 16. Experiment A(circular element recognition from images scanned at 100dpi from a book): test, ideal, one-level
operator, and two-level operator images, from top to bottom and left to right respectively.
Fig. 17. Experiment B(Map region extraction): test and two-level operator images, respectively.
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 14
Fig. 18. Experiment C(character recognition): test, C1one-level operator, C1two-level operator, and C2two-level
operator images, from top to bottom and left to right respectively.
PLACE
PHOTO
HERE
Nina S. T. Hirata received the PhD degree in
Computer Science from the University of S˜
ao
Paulo, Brazil, in 2000. She is a professor of com-
puter science at the same university. Her cur-
rent research interests include nonlinear image
processing, machine learning applied to image
operator design, multiple classifier systems, in-
teractive image segmentation, and handwritten
recognition.
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 15
Fig. 19. Experiment D(magazine page text segmentation): test, D1one-level operator, D1two-level operator, and D2
two-level operator images, from top to bottom and left to right respectively.
MULTI-LEVEL TRAINING OF BINARY MORPHOLOGICAL OPERATORS 16
Fig. 20. Experiment E(book page text segmentation): test and two-level operator images, respectively.
Fig. 21. Experiment F(Boolean noise filtering): test, ideal, one-level operator and two-level operator images, from
top to bottom and left to right respectively. The images consist of simulated Boolean squares with Boolean noise,
both uniformly distributed in the image domain. Squares have size that follows a normal distribution, while noise are
subsets of the 3×3square with size varying uniformly from 2 to 5.
... Within the group, machine learning based methods have been first developed for designing binary operators [8], then for operators that commute with thresholding (also known as stack filters) [16,43], for a class of operators called aperture filters [46], and for gray-scale operators [63]. Under this context, several algorithms, methods, and applications have been developed [10,14,40,45,47,48,76]. Section 3 presents a review of the main progresses achieved regarding machine learning of binary morphological operators. ...
... A second method for designing operators that include sequential compositions exploits combination of operators [40]. At the first level, multiple operators (i) Bottom, from left to right: zoom in on the input, and results of operators 1 using window 9 × 9, 2 using window 5 × 9, 3 using window 5 × 3 and 4 using window 3 × 3 further, resulting in a multi-level training scheme as detailed in [40]. ...
... A second method for designing operators that include sequential compositions exploits combination of operators [40]. At the first level, multiple operators (i) Bottom, from left to right: zoom in on the input, and results of operators 1 using window 9 × 9, 2 using window 5 × 9, 3 using window 5 × 3 and 4 using window 3 × 3 further, resulting in a multi-level training scheme as detailed in [40]. The iterative learning approach [45] described above is a particular case of this one, where k = 1 for all levels. ...
Article
Morphological image operators are a class of non-linear image mappings studied in Mathematical Morphology. Many significant theoretical results regarding the characterization of families of image operators, their properties, and representations are derived from lattice theory, the underlying foundation of Mathematical Morphology. A fundamental representation result is a pair of canonical decompositions for any translation-invariant operator as a union of sup-generating or an intersection of inf-generating operators, which in turn can be written in terms of two basic operators, erosions and dilations. Thus, in practice, a toolbox with functional operators can be built by composing erosions and dilations, and then operators of the toolbox can be further combined to solve image processing problems. However, designing image operators by hand may become a daunting task for complex image processing tasks, and this motivated the development of machine learning based approaches. This paper reviews the main contributions around this theme made by the authors and their collaborators over the years. The review covers the relevant theoretical elements, particularly the canonical decomposition theorem, a formulation of the learning problem, some methods to solve it, and algorithms for finding computationally efficient representations. More recent contributions included in this review are related to families of operators (hypothesis spaces) organized as lattice structures where a suitable subfamily of operators is searched through the minimization of U-curve cost functions. A brief account of the connections between morphological image operator learning and deep learning is also included.
... In morphological image processing, one can build pipelines that perform complex image transformations by composing two basic operators, erosions and dilations, and others built from them [6,19,20]. Some degree of success has been achieved by machine-learningbased design methods, in restricted situations such as in binary image processing [21,22], in learning subfamilies of operators [23], or in specific applications [24]. However, designing optimal processing pipelines consisting of sequential composition of operators is an unsolved problem, as they usually lead to large combinatorial problems. ...
... The approach delineated above quickly becomes computationally prohibitive as the support region W increases. To circumvent this limitation, iterative training approaches [44] as well as a method [22] inspired in stacked generalization [45] have been proposed. In [22], a multi-level approach, based on a greedy optimization strategy, in which several first-level operators are optimized individually, then their outputs are used as input for the secondlevel operators, which are again optimized individually, and so on for the successive levels, has been proposed. ...
... To circumvent this limitation, iterative training approaches [44] as well as a method [22] inspired in stacked generalization [45] have been proposed. In [22], a multi-level approach, based on a greedy optimization strategy, in which several first-level operators are optimized individually, then their outputs are used as input for the secondlevel operators, which are again optimized individually, and so on for the successive levels, has been proposed. The combination of multiple operators has resemblance to ensemble methods used in machine learning. ...
Article
Full-text available
Morphological operators are nonlinear transformations commonly used in image processing. Their theoretical foundation is based on lattice theory, and it is a well-known result that a large class of image operators can be expressed in terms of two basic ones, the erosions and the dilations. In practice, useful operators can be built by combining these two operators, and the new operators can be further combined to implement more complex transformations. The possibility of implementing a compact combination that performs a complex transformation of images is particularly appealing in resource-constrained hardware scenarios. However, finding a proper combination may require a considerable trial-and-error effort. This difficulty has motivated the development of machine-learning-based approaches for designing morphological image operators. In this work, we present an overview of this topic, divided in three parts. First, we review and discuss the representation structure of morphological image operators. Then we address the problem of learning morphological image operators from data, and how representation manifests in the formulation of this problem as well as in the learned operators. In the last part we focus on recent morphological image operator learning methods that take advantage of deep-learning frameworks. We close with discussions and a list of prospective future research directions.
... In this work we address problems where supervision is provided at the pixel level and the task is modeled as an image transformation. The approach proposed here is built on the image operator learning framework used along the years to address this type of problems [4], [5], [6]. This framework uses image operators that are translation-invariant and locally defined with respect to a finite non-empty window to model image transformations. ...
... This framework uses image operators that are translation-invariant and locally defined with respect to a finite non-empty window to model image transformations. Estimating these type of image operators from training data has been the subject of study since the 1990s [7], [5], [8], with some interesting results in binary image processing [6], [9]. Although here we restrict ourselves to binary image transformations (like the one in the first example above), modeling and concepts apply also to gray-scale image transformations. ...
... Although here we restrict ourselves to binary image transformations (like the one in the first example above), modeling and concepts apply also to gray-scale image transformations. Application examples include documents [10], [4], comics [11], noise filtering [12], [8], retinal images [13], [14], diagrams [6], and others. ...
... In this work we address problems where supervision is provided at the pixel level and the task is modeled as an image transformation. The approach proposed here is built on the image operator learning framework used along the years to address this type of problems [4], [5], [6]. This framework uses image operators that are translation-invariant and locally defined with respect to a finite non-empty window to model image transformations. ...
... This framework uses image operators that are translation-invariant and locally defined with respect to a finite non-empty window to model image transformations. Estimating these type of image operators from training data has been the subject of study since the 1990s [7], [5], [8], with some interesting results in binary image processing [6], [9]. Although here we restrict ourselves to binary image transformations (like the one in the first example above), modeling and concepts apply also to gray-scale image transformations. ...
... Although here we restrict ourselves to binary image transformations (like the one in the first example above), modeling and concepts apply also to gray-scale image transformations. Application examples include documents [10], [4], comics [11], noise filtering [12], [8], retinal images [13], [14], diagrams [6], and others. ...
Article
Full-text available
Many image transformations can be modeled by image operators that are characterized by pixel-wise local functions defined on a finite support window. In image operator learning, these functions are estimated from training data using machine learning techniques. Input size is usually a critical issue when using learning algorithms, and it limits the size of practicable windows. We propose the use of convolutional neural networks (CNNs) to overcome this limitation. The problem of removing staff-lines in music score images is chosen to evaluate the effects of window and convolutional mask sizes on the learned image operator performance. Results show that the CNN based solution outperforms previous ones obtained using conventional learning algorithms or heuristic algorithms, indicating the potential of CNNs as base classifiers in image operator learning. The implementations will be made available on the TRIOSlib project site.
... Our work is based on image operator learning, which has been successfully applied to binary document image segmentation [5]. As shown in Fig. 2, the input image is a score image, and the output image is a score image without staff lines. ...
... In the work of [22], a series of morphological image operators were proposed to complete the task of staff lines deletion. Inspired by Hirata et al. [5], Montagner et al. [7] proposed a machine learning method to learn image operators to delete staff lines. Calvo-Zaragoza et al. [23] regarded staff lines deletion as pixel classification task and adopted supervised learning method to realize staff lines recognition. ...
Article
Full-text available
The removal of staff lines is the most significant step to separate notes from the score images in optical music recognition (OMR). However, musical images are often affected by different deformations, and it is difficult to delete the staff lines completely without affecting the integrity of the notes. A novel multi-layer image operator learning algorithm based on sample structure is proposed in this paper to solve the problem of staff lines deletion. Our algorithm is dedicated to obtain the structural characteristics of staff lines via image operator learning. Firstly, an iterative strategy is proposed to update the distribution of the samples for learning multiple image operators with different sample structure features. Further, based on the learned image operators, a multi-layer image operators network is designed to obtain the optimal combination of multiple operators. Finally, we have verified the feasibility of our algorithm on the data set 2013 ICDAR/GREC staff lines removal competition. The experiment shows that the proposed algorithm is robust against many kinds of deformation. Moreover, our algorithm is more competitive by comparing with state-of-the-art algorithms.
... Since the late nineties, approaches for the automatic design of morphological operators based on machine learning have been proposed, mainly by the work of Ed. Dougherty, J. Barrera and their collaborators [10], and have been extensively applied in the literature with great success to solve specific problems [7,12,13,15,21,22,24,25,26,27,32,33,40]. Many of these approaches are also heuristics, which may consider prior information about the problem at hand. ...
Preprint
Full-text available
A classical approach to designing binary image operators is Mathematical Morphology (MM). We propose the Discrete Morphological Neural Networks (DMNN) for binary image analysis to represent W-operators and estimate them via machine learning. A DMNN architecture, which is represented by a Morphological Computational Graph, is designed as in the classical heuristic design of morphological operators, in which the designer should combine a set of MM operators and Boolean operations based on prior information and theoretical knowledge. Then, once the architecture is fixed, instead of adjusting its parameters (i.e., structural elements or maximal intervals) by hand, we propose a lattice gradient descent algorithm (LGDA) to train these parameters based on a sample of input and output images under the usual machine learning approach. We also propose a stochastic version of the LGDA that is more efficient, is scalable and can obtain small error in practical problems. The class represented by a DMNN can be quite general or specialized according to expected properties of the target operator, i.e., prior information, and the semantic expressed by algebraic properties of classes of operators is a differential relative to other methods. The main contribution of this paper is the merger of the two main paradigms for designing morphological operators: classical heuristic design and automatic design via machine learning. Thus, conciliating classical heuristic morphological operator design with machine learning. We apply the DMNN to recognize the boundary of digits with noise, and we discuss many topics for future research.
... The staff detection is sorted out using a maximization problem, as the score is modelled as chart. This procedure was improved and stretched out to be utilized on grey-scale scores [15]; Dutta et al. [6] built up a method that considers the staff line portion as a horizontal association of vertical black runs with regular height, that is confirmed by utilizing adjacent properties; in crafted by Piatkowska et al. [13], a Swarm Intelligence algorithm was implemented to identify the staff line patterns; Su et al. [18] begin evaluating properties of the staves like tallness and space, at that point, they attempted to anticipate the heading of the lines and fitted an estimated staff which was later balanced; Geraud [8] built up a technique that involves a set of morphological operators straightforwardly implemented to an image of a music score to eliminate staff lines; and Montagner et al. [12] proposed to learn image operators, following crafted by Hirata [11], whose combination was able to eliminate staff lines. Others works were also addressed which covers the entire OMR issue by building up their own, casecoordinated staff elimination process [16,19]. ...
Conference Paper
This paper presents an approach for music staff elimination from an image containing music score without disturbing any symbol information. This is one of the most important tracks which improve the realization in optical music recognition system. Depending on intrinsic of music scores staff elimination was performed in literature based on image processing techniques. Here, a problem is modeled as a supervised pixel learning classification task for the elimination of staff lines. In this scenario, front pixel is tagged as symbol or staff. To train the classification algorithms, a pairs of scores with and without stafflines are used. Our proposed methodology is tested with other well-know classification techniques. Moreover, in our experiment certain parameters are set to default setting which is provided by libraries, an attempt of tuning the classification algorithm is not permitted. Our ultimate aim represents that even though by applying the straightforward method, still our method gives competitive results by using significant algorithms. Some of the advantages of using this method over the earlier methods are its high versatility.
... This method trained suitable windows for morphological operations and then used them to eliminate staff lines effectively. Manuel was another learning based method which was proposed by Montagner et al. [38] which was based on Hirata [20]. They trained image operators and utilized them to detect and remove staff lines. ...
Article
Full-text available
Optical Music Recognition (OMR) can be divided into three main phases: (i) staff line detection and removal. The goal of this phase is to detect and to remove staff lines from sheet music images. (ii) music symbol detection and segmentation. The propose of this phase is to detect the remaining musical symbols such as single symbols and group symbols, then segment the group symbols to single or primitive symbols after removing staff lines. (iii) musical symbols recognition. In this phase, recognition of musical symbols is the main objective. The method presented in this paper, covers all three phases. One advantage of the first phase of the proposed method is that it is robust to staff lines rotation and staff lines which have curvature in sheet music images. Moreover, the staff lines are removed accurately and quickly and also fewer details of the musical symbols are omitted. The proposed method in the first phase focuses on the hand-written documents databases which have been introduced in the CVC-MUSCIMA and ICDAR 2013. It has the lowest error rate among well-known methods and outperforms the state of the art in CVC-MUSCIMA database. In ICDAR 2013, the specificity measure of this method is 99.71% which is the highest specificity among available methods. Also, in terms of accuracy, recall rate and f-measure is only slightly less than the best method. Therefor our method is comparable favorably to the existing methods. In the second phase, the symbols are divided into two categories, single and group. In the recognition phase, we use a pattern matching method to identify single symbols. For recognizing group symbols, a hierarchical method is proposed. The proposed method in the third phase has several advantages over the previous methods. It is quite robust to skewness of musical group symbols. Furthermore, it provides high accuracy in recognition of the symbols.
... This strategy was improved and extended to be used on grayscale scores [19]; Dutta et al. [8] developed a method that considers the staff-line segment as a horizontal connection of vertical black runs with uniform height, which were validated using neighboring properties; in the work of Piatkowska et al. [17], a swarm intelligence algorithm is applied to detect the staffline patterns; Su et al. [24] started estimating properties of the staves like height and space. Then, they tried to predict the direction of the lines and fit an approximate staff, which is posteriorly adjusted; Geraud [11] developed a method that entails a series of morphological operators directly applied to the image of the score to remove staff lines; Montagner et al. [16] proposed to learn image operators, following the work of Hirata [13], whose combination is able to remove staff lines. On the other hand, some studies addressed the whole OMR problem by developing their own, case-directed staff removal process [22,25]. ...
Article
Full-text available
Staff-line removal is an important preprocessing stage for most optical music recognition systems. Common procedures to solve this task involve image processing techniques. In contrast to these traditional methods based on hand-engineered transformations, the problem can also be approached as a classification task in which each pixel is labeled as either staff or symbol, so that only those that belong to symbols are kept in the image. In order to perform this classification, we propose the use of convolutional neural networks, which have demonstrated an outstanding performance in image retrieval tasks. The initial features of each pixel consist of a square patch from the input image centered at that pixel. The proposed network is trained by using a dataset which contains pairs of scores with and without the staff lines. Our results in both binary and grayscale images show that the proposed technique is very accurate, outperforming both other classifiers and the state-of-the-art strategies considered. In addition, several advantages of the presented methodology with respect to traditional procedures proposed so far are discussed.
Article
Morphological filters are an important class of non-linear digital signal processing and analysis filters, having found a range of applications, giving excellent results in areas such as noise reduction, edge detection and object recognition. However, design methods existing for these morphological filters tend to be computationally intractable or require some expert knowledge of mathematical morphology. This paper demonstrates how simple genetic algorithms can be employed in the search for optimum morphological filters for specific signal/image processing tasks. Some examples of applying the method to some real noise-reduction tasks are shown.
Book
Morphological image processing, now a standard part of the imaging scientist's toolbox, can be applied to a wide range of industrial applications. Concentrating on applications, this book shows how to analyze a problem and then develop successful algorithms based on the analysis. The book is hands-on in a very real sense: readers can download a demonstration toolbox of techniques and images from the web so they can process the images according to examples in the text.