Content uploaded by Martin Bresler
Author content
All content in this area was uploaded by Martin Bresler on Dec 14, 2015
Content may be subject to copyright.
Detection of Arrows in On-line Sketched Diagrams
using Relative Stroke Positioning
Martin Bresler, Daniel Pr˚
uˇ
sa, V´
aclav Hlav´
aˇ
c
Czech Technical University in Prague, Faculty of Electrical Engineering,
Department of Cybernetics, 166 27, Praha 6, Technick´
a 2, Czech Republic
{breslmar, prusapa1, hlavac}@cmp.felk.cvut.cz
Abstract
This paper deals with recognition of arrows in online
sketched diagrams. Arrows have varying appearance and
thus it is a difficult task to recognize them directly. It is ben-
eficial to detect arrows after other symbols (easier to de-
tect) are already found. We proposed [4] an arrow detector
which searches for arrows as arbitrarily shaped connectors
between already found symbols. The detection is done two
steps: a) a search for a shaft of the arrow, b) a search for
its head. The first step is relatively easy. However, it might
be quite difficult to find the head reliably. This paper brings
two contributions. The first contribution is a design of an
arrow recognizer where the head is detected using relative
strokes positioning. We embedded this recognizer into the
diagram recognition pipeline proposed earlier [4] and in-
creased the overall accuracy. The second contribution is an
introduction of a new approach to evaluate the relative posi-
tion of two given strokes with neural networks (LSTM). This
approach is an alternative to the fuzzy relative positioning
proposed by Bouteruche et al. [2]. We made a comparison
between the two methods through experiments performed on
two datasets for two different tasks. First, we used a bench-
mark database of hand-drawn finite automata to evaluate
detection of arrows. Second, we used a database presented
in the paper by Bouteruche et al. containing pairs of ref-
erence and argument strokes, where argument strokes are
classified into 18 classes. Our method gave significantly
better results for the first task and comparable results for
the second task.
1. Introduction
This paper deals with on-line handwriting recognition,
where the input consists of a sequence of strokes. A stroke
is a sequence of points captured by an ink-input device (the
most commonly a tablet or a tablet PC) as the user was writ-
ing with a stylus or his finger. In handwriting recognition,
the research has already moved from recognition of plain
text to recognition of a more structured input as diagrams.
This work is focused on recognition of arrows in on-line
sketched diagrams.
Arrows are the most important symbols in diagrams,
since they bear the most valuable information about the
diagram structure – what symbols are connected together.
However, it is a difficult task to recognize them because
of their varying appearance. We consider two diagram do-
mains – finite automata (FA) and flowcharts (FC). There is
a freely available benchmark database available for each of
the domains: the FA database [4] and the FC database [1].
Figure 1 shows examples of diagrams from these two do-
mains. It is obvious that arrows can be arbitrarily directed
and their shafts might be straight lines, curved lines, or
polylines. Moreover, their heads have a different shape.
There exists an approach, where arrows are detected first
and the knowledge of arrows helps to naturally segment the
rest of the symbols [14]. The problem is that authors of this
approach put very strict requirements on the way the arrow
is drawn. It must consist of one or two strokes and the ar-
row’s head must have only one predefined shape. Another
approach is to detect arrows the same way as other sym-
bols – using a classifier based on the symbol appearance.
Since the arrows might be arbitrarily rotated and the heads
might have different shapes, it is necessary to create several
arrow sub-classes. This approach is more general, but the
achieved accuracy is limited. The state-of-the-art methods
in flowchart recognition achieve always very small accu-
racy in arrow recognition [5, 3]. We already suggested [4]
that it is better to detect arrows after the other symbols
are detected. We proposed an algorithm, which searches
for arrows as arbitrarily shaped connectors between already
found non-arrow symbols. It works in two stages: a) ar-
row shaft detection, b) arrow head detection. The detection
of arrow head is based on heuristics and does not achieve
satisfactory precision. In this paper, we employ machine
learning to improve the proposed arrow detector with arrow
head classifier based on relative strokes positioning.
(a) (b)
Figure 1. Examples of hand-drawn diagrams containing arrows connecting symbols with rigid bodies: (a) finite automata, (b) flowchart.
In many cases, appearance does not give us enough in-
formation to classify single strokes and we need some con-
textual information. Relative position of a stroke with re-
spect to a reference stroke is the most intuitive. Bouteruche
et al. [2] addressed this problem directly and proposed a
fuzzy relative positioning method. The authors introduced
a method evaluating the relative position of strokes based
on the fact how pairs of strokes fulfil a set of relations such
as ”the second stroke is on the right of the first stroke”
through defined fuzzy landscapes. They used this method
to solve a prepared task, where pairs of reference and ar-
gument strokes are given and the argument strokes have to
be classified into 18 classes corresponding to several types
of accentuation or punctuation. The information about the
appearance and the relative position of the argument stroke
with respect to the reference stroke must be combined to-
gether to achieve a good recognition rate. This task ade-
quately demonstrates the need for relative positioning sys-
tem. They used Radial Basis Function Networks (RBFN)
as a classifier. The method was further improved by a better
definition of fuzzy landscapes and using SVM by Delaye et
al. [7]. Although the fuzzy relative positioning is a power-
ful method useful for more complex tasks as recognition of
structured handwritten symbols (Chinese characters) [6], it
gives poor results when applied on arrow head detection.
Our work brings two contributions. First, we define ar-
row head detection as a classification of possible arrow head
strokes based on relative positioning. We used this arrow
head classifier to significantly improve proposed arrow de-
tector. Second, we propose a new method for evaluation
of the relative position of strokes, which exploits simple
low-level features and uses Bidirectional Long Short Term
Memory (BLSTM) Recurrent Neural Network (RNN) as a
classifier. The BLSTM RNN proved to be a good tool for
classification of individual strokes [13].
The rest of the paper is organized as follows. Section 2
describes the proposed arrow detector and the way the rel-
ative positioning is exploited to determine which strokes
represent the head of the arrow. Section 3 introduces our
method for evaluation of the relative position. Experiments
and their results are described in Section 4. Finally, we
make a conclusion in Section 5.
2. Arrow detector
Arrows are symbols with a non-rigid body. They consist
of two parts: shaft and head. The head defines the orien-
tation of the arrow. However, arrow’s appearance can be
changing arbitrarily according to the given domain. They
can have various shapes, lengths, heads, and directions.
Therefore, it is a difficult task to detect arrows with or-
dinary classifiers based on symbol appearance. However,
each arrow connects two other symbols with a rigid body
(see Figure 1). It is beneficial to detect these symbols first
and leave the arrow detection to another classifier detecting
arrows between pairs of these symbols. This new classifier
must perform the following two steps:
1. Find a shaft of the arrow connecting the given two
symbols. This shaft is just a sequence of strokes lead-
ing from a vicinity of the first symbol to a vicinity of
the second symbol and it is undirected.
2. Find a head of the arrow, which is located around one
of the end-points of the shaft. The head defines orien-
tation of the arrow (if it is heading from the first sym-
bol to the second symbol or vice versa).
The detection of an arrow’s shaft can be done iteratively
by simply adding strokes to a sequence such that the first
stroke starts in a vicinity of the first symbol and the last
stroke ends in a vicinity of the second symbol. A new stroke
is added to the sequence only if the distance between the
end-point of the last stroke and the end-point of the new
stroke is smaller than a threshold. The algorithm must con-
sider all possible combinations of strokes creating a valid
connection between the given two symbols. The search
Detection of arrow
shaft
Extraction of
reference strokes
and points
Query strokes
search and
classification
Query strokes
search and
classification
Selection of the best
arrow head
Ref.point A
Ref. point B
Head A
Head B
Arrow
Shaft
Pairs of symbols
Figure 2. Arrow recognition pipeline. The recognition process is illustrated on a simple example of two symbols from FC domain.
space can be reasonably reduced by setting a maximal num-
ber of strokes in the sequence. This number depends on the
domain and the fact, how many strokes users use to draw ar-
row’s shafts. Typically, it is four and two for flowcharts and
finite automata, respectively. We can immediately remove
some shafts, which are in a conflict with another shafts,
and keep those with the smallest sum of the following dis-
tances: a) distance between the first symbol and the first
stroke of the shaft, b) distance between the second symbol
and the last stroke of the shaft, c) distances between indi-
vidual strokes of the shaft.
Since we do not know the orientation of the arrow yet
and the shaft is undirected, we have to consider both end-
points of the shaft and try to find two heads (one in the
vicinity of each end-point). Ideally we will be able to find
just one head. In practice, it can happen that we find two
heads and we have to decide which one is better. The de-
tection of an arrow’s head is not a trivial task, because there
might be a lot of interfering strokes around the end-points
of the shaft: heads of another arrows or text. The deci-
sion which strokes represent the true arrow’s head we are
looking for and which are not, is a task, where the stroke
positioning might be beneficially used. First, we define a
reference stroke (a sub-stroke of the shaft) and a reference
point (end-point of the shaft), which are used to express
a relative position of query strokes (details follow in Sec-
tion 2.1). Second, this information about relative position is
given to a classifier making the decision. The query strokes
are all strokes in a vicinity of a given end-point of the shaft,
which are not a part of the shaft itself nor the two given
symbols. We make a classification into two classes: head
and not-head. Explanation for the evaluation of the relative
position of strokes and classification is given in Section 3.
Let us just note that the classifier returns a class into which
the query stroke is classified along with a potential. We use
this potential to decide which head is of better quality in the
case we find two. We just compute a sum of potentials of all
strokes in each head and decide for the head with the big-
ger value. This slightly favours heads consisting of higher
number of strokes, which is desirable in the most cases. A
pseudocode for the algorithm that we just described is di-
vided into two procedures and presented in the supplemen-
tary material as Algorithm 1 and Algorithm 2. The arrow
recognition pipeline is depicted in Figure 2.
It happens quite often that the user draws a shaft and a
head of an arrow by one stroke. Our algorithm would fail
in that case. Therefore, we make one important step before
we try to find the arrow’s head – we segment the last stroke
of the shaft into smaller sub-strokes in such a way that the
head is split from the shaft. Created sub-strokes are divided
into two groups. One group is used to finish the shaft again
such that it reaches the symbol again. Sub-strokes of the
second group are put into the set of query strokes possibly
forming the head. Our splitting algorithm is described in
Section 2.2. If the shaft and the head are not drawn by one
stroke, the algorithm will ideally perform no segmentation
and this step can be skipped.
2.1. Reference stroke and reference point
It is necessary to define a reference stroke. Position of all
query strokes will be evaluated relatively with respect to it.
Naturally, it seems that the arrow’s shaft should be the refer-
ence stroke. However, it is better to use just a sub-stroke of
the shaft for this purpose. The reason is that the shaft might
be arbitrarily curved or refracted, the whole arrow might
be arbitrarily rotated, and we want to normalize the input
in such a way that the reference stroke has always more or
less the same appearance and the query strokes have always
more or less the same relative position. Therefore, we cre-
ate a sub-stroke beginning at the end-point of the shaft with
a shape of a line segment. It is done iteratively by adding
points to the newly created stroke until the value of a crite-
rion, expressing how similar is the stroke to a line, is bigger
than a threshold. The criterion is a ratio of the distance be-
tween the end-points of the stroke and the path length of the
stroke (sum of distances between neighbouring points). We
set the threshold empirically to 0.95. Another condition is
that the distance between end-points of the stroke must be
bigger then a threshold empirically derived from the aver-
age length of strokes, because the possible presence of so
called hooks at ends of strokes would cause small value of
the criterion for short strokes. Figure 3 illustrates how the
reference stroke is determined as a sub-stroke of the shaft.
Then we rotate the reference stroke and all query strokes
by such an angle that the vector given by the end-points of
the reference stroke will be pointing in the direction of the
x-axis. In another words, it will cause that the true arrow
heads should point from the left to the right. For purposes
of our method for evaluation of relative position of strokes
(described in Section 3), we have to define a reference point.
Obviously, it is the end-point of the shaft.
(a)
(b) (c)
(d) (e)
Figure 3. Example showing a diagram and the way of choosing
the reference point, the reference stroke, and the rotation. Indi-
vidual pictures illustrates the following: (a) whole diagram with a
highlighted (red) arrow to be detected, (b) detected arrow’s shaft
is blue and right end-point is considered to be the reference point,
second point is green, the angle αused to rotate query strokes is
marked, (c) rotation is done, the reference point is red as well as
strokes of the real arrow’s head, (d) analogously to (b) with the
other end-point considered, (e) analogously to (c) with exception
that there is no real head, because the arrow’s orientation is wrong.
Because we still do not know the orientation of the arrow,
we have to consider both options: the arrow is heading to
the first symbol or the second symbol. Therefore, we define
two reference points, end-points of the shaft. A reference
(sub)stroke is associated to each of these two points then.
Figure 3 shows the whole process of the reference stroke
extraction and rotation.
2.2. Stroke segmentation
Stroke segmentation is very important field of research,
because it is frequently used preprocessing step. Therefore,
there exist various papers dealing with this problem. The
segmentation is done by defining a set of splitting points.
The substantial information is curvature and speed defined
at each point and geometric properties of stroke segments.
The common approach is to find tentative splitting points
with high curvature and low speed. The best subset of
these points is selected according to the error function fit-
ting points of each segment into selected primitives. The
most common primitives are line segments and arcs [8, 15].
It is also possible to use machine learning to train a classifier
detecting the splitting points [9, 11].
The presented algorithms are sophisticated and allow to
find segments fitting predefined primitives. However, us-
ing any of these methods seems to be an overkill for our
task. We do not require to split a stroke at any precisely de-
fined point nor to create segments with particular geomet-
rical properties (line segments or arcs). All we need is to
split the arrow’s head from its body and it is not important
if both the body and the head will be further split into sev-
eral segments. Therefore, we suggest to use much simpler
algorithm for stroke segmentation. Its description follows.
We compute a value AA, which we call “accumu-
lated angle”, associated to each point of the stroke S=
{p1, p2, . . . , pn}according to the following equation:
AAi= mean(Rank3{A(i, 1),...,A(i, min(i−1, n−i, R))}),
(1)
where iis the index of the point in the sequence, Rank3
is an operator choosing up to the three smallest values of
a given set, Ris the maximal radius, and Ais a function
computing an angle between two vectors defined by the in-
dex of the given reference point and its two neighbouring
points chosen by the size of the radius. The function Ais
defined as follows:
A(i, r) = arccos
−−−−→
pipi−r·−−−−→
pipi+r
kpipi−rk·kpipi+rk.(2)
Let us note that AAiis computed according to Equation (1)
only for i∈ {2, . . . , n −1}and AA1=AAn= 0. We
define the initial set of splitting points by taking points
where the AA reached a local minima and the value is
smaller than mCoeff ·mean{AA1, . . . , AAn}. In the case
that there are two splitting points too close to each other
(dist(pi, pj)<distThresh), we remove one with smaller
AA value. We set mCoeff = 0.5and distThresh = 200
empirically. After this removal, the segmentation is done.
We tested the described algorithm on arrows from the FA
database (see Section 4.1) which were drawn by one stroke
and it turned out that the algorithm split the head from the
body in 100% of cases. Let us emphasize that parameters
mCoeff and distThresh are tunable. It makes it easy to
adjust for demands of a given task.
3. Evaluation of relative position of strokes
Unlike the method by Bouteruche et. al, where a query
stroke is evaluated with respect to the whole reference
α1
α2
αn
d1d2
dn
x
p1
p2
pn
R
(a) Arrow domain.
pn
p1
d1
dn
α1
αn
x
R
(b) Accent domain.
Figure 4. Example showing pairs of reference and query strokes and extracted sequences of features (angles and distances) for both
domains. Reference point R is marked red. In the case of arrow domain, both, the reference and the query, strokes are already rotated. The
query stroke is a sequence of points {p1, p2,...,pn}.
stroke by evaluating its fuzzy structuring element, we pro-
pose to evaluate the relative position of a query stroke with
respect just to a single point of the reference stroke. In the
case of arrows, it is the end-point of the arrow’s shaft. In
the case of the task defined by Bouteruche et al., it can be
an arbitrary fix point. We propose to choose a center of the
reference stroke’s bounding box.
We are given a reference stroke, which is represented by
its reference point Rand a query stroke Sdefined by a se-
quence of its points: S={p1, p2, . . . , pn}. To describe the
relative position of Swith respect to R, we express relative
position of each point piusing polar coordinates. Position
of each point is defined by the angle αi=−−→
Rpi∠−→
xand the
distance di=kRpik. We create a sample for each pair of
a reference and query strokes consisting of a sequence of
the described features {[α1, d1],[α2, d2],...,[αn, dn]}and
a label indicating the class of the query stroke. For illus-
tration, see Figure 4. We propose to use (B)LSTM RNN as
a classifier, because it reaches the best results in many ap-
plications. However, it is possible to use different tools for
classifying sequences (e.g. Hidden Markov Models). When
dealing with neural networks, it make sense to normalize
inputs:
ˆvk=vk−mk
σk
,(3)
where vkis an input value, ˆvkis the normalized value, mk
and σkare the mean and the standard deviation of all values
of the same feature from the training database, respectively.
We use this normalization to normalize the distance only.
The advantage of proposed features is the fact that they
are simple and easy to extract (low time complexity). More-
over, they express relative position of the query stroke with
respect to the reference point as well as the shape of the
query stroke. It is possible to reconstruct the trajectory of
the query stroke from the sequence of features. It leads to
simple implementation and fast evaluation.
4. Experiments
We made experiments on two tasks. The first one is the
task defined in this paper – classification of strokes into two
classes head,not-head when the reference stroke is a part
of the arrow’s shaft. The second task is to classify argument
strokes representing accentuation or punctuation of its refer-
ence strokes into 18 classes. The task as well as the database
called ACCENT was proposed by Bouteruche et al. In the
case of arrows we additionally evaluated the whole process
of arrows detection, where the stroke classification is a sub-
task. We used both positioning methods to solve both tasks
and we made a comparison. All experiments were done on
a standard tablet PC Lenovo X230 (Intel Core i5 2.6 GHz,
8GB RAM) with 64-bit Windows 7 operating system.
4.1. Arrows
We used the FA database for this experiment. The ver-
sion 1.1 contains the annotation of heads and shafts of ar-
rows. We extracted a reference point and stroke for each
arrow as described in Section 2.1. The only difference is
that the shaft is known from the annotation. We created
a set of query strokes and rotated these strokes according
to the reference stroke. We extracted features with respect
to the reference point or the reference stroke depending on
the used method for each query stroke and assigned a label
based on the annotation from the database. We refer to the
samples with the label head as positive and those with the
label not-head as negative samples. The FA database con-
sists of 12 diagram patterns drawn by several users and it is
split into training and test dataset. The training dataset con-
tains diagrams from 11 users (132 diagrams) and the test
dataset diagrams from 7 users (84 diagrams). Each dia-
gram is formed of 54 strokes and contains 5 symbols and
10 arrows in average. We extracted 1480/834 positive and
1263/1019 negative samples from the training/test dataset.
Arrows drawn by one stroke are manually segmented in the
database. However, to demonstrate our segmentation algo-
rithm from Section 2.2, we created a second test dataset (ref.
as test2), where we further segmented query strokes. Ob-
tained sub-strokes created new samples with the same label
as the original ones. We used this dataset to show that possi-
ble oversegmentation will not lower the final precision. We
created 1252 positive and 1876 negative examples this way.
For our method, we used LSTM and BLSTM RNNs im-
plemented within the library JANNLab [12]. We tried dif-
ferent numbers of nodes in the hidden layer to get the best
performance. We always trained the network in 200 epochs
with the following parameters: learning rate 0.001, momen-
tum 0.9. We achieved the best overall precision of 99.9%
with the BLSTM RNN with 32 nodes in the hidden layer.
However, it might be important to find a trade-off between
precision and time complexity and thus it might be better
to use the LSTM RNN with only 8 nodes in the hidden
layer, because it is significantly faster. It gives the precision
of 99.6 % and the average time needed for classification is
0.79 ms. For details, refer to Figure 5. The best achieved
precision for individual classes are given in Table 1. The
achieved precision on the test2 with the best trained neural
network was not decreased and reached 99.9 %.
For the method of Bouteruche et al., we used a RBFN
implemented within the library Encog [10]. We set the
number of the nodes in the hidden layer to be a power of
the number of features, which leads to equally spaced RBF
centers. It is the setting qiving the best performance. We
tried two sets of features proposed by Bouteruche et al. re-
ferred in their paper by numbers 4 and 5 and we achieved
the accuracy of 95.4 % and 88.2 %, respectively. It is not
surprising that the feature set number 5 reached much worse
results. It contains features expressing how much a query
stroke fits into structuring elements of all classes. However,
in this case, we have just two classes and the class of nega-
tive samples contains arbitrarily shaped strokes and thus the
structuring elements are too wide. We also implemented the
method by Delaye et al. [7]. Their filtered fuzzy landscape
is an improvement of the Bouteruche’s feature set 5 and thus
it gives rather low precision for the very same reason. The
feature set number 4 gives much better results. However, it
was still inferior in comparison with our method – the best
overall precision of 95.36 %. For more detailed results see
again Table 1.
Since we use RNN in our method, the classification has
higher time complexity (especially with increasing com-
plexity of the net). The classification made by a RBFN is
indeed very fast. On the other hand, it is much faster to
extract the low level features we use: 0.016 ms per sample.
Feature extraction is slower in the case of fuzzy positioning:
2.89 ms per sample for the feature set number 4 and 0.99 ms
per sample for the feature set number 5.
2 4 8 16 32
96
96.5
97
97.5
98
98.5
99
99.5
100
number of nodes in the hidden layer [−]
precision [%]
Precision of RNNs
LSTM
BLSTM
2 4 8 16 32
0
1
2
3
4
5
6
7
number of nodes in the hidden layer [−]
time [ms]
Time needed to classify one sample
LSTM
BLSTM
Figure 5. Dependency of precision and time complexity on the
number of nodes in the hidden layer of RNNs for the FA database.
Method positive negative overall
Ours 99.91 % 99.85 % 99.88 %
Bouteruche et al. (4) 98.56 % 92.75 % 95.36 %
Bouteruche et al. (5) 94.24 % 83.32 % 88.24 %
Delaye et al. 95.17 % 86.07 % 90.17 %
Table 1. Comparison of precisions for arrow heads detection.
4.1.1 Arrow detector test
We took all annotated symbols with rigid bodies and
tried to find arrows with the arrow detector we proposed
(query strokes for arrow heads were classified with our best
BLSTM RNN). We compared the detected arrows with an-
notated arrows. Let us remind that all pairs of symbols were
considered. Conflicting arrow shafts were removed imme-
diately. However, adding arrow heads may cause another
conflicts. The result of the arrow detector is a list of ar-
row candidates and a structural analysis should be done to
solve the conflicts. However, we tried to remove conflicts
by simply keeping arrows with higher confidence to see how
it affects recall and precision. The test dataset of the FA
database contains 796 arrows. We achieved the recall of
95.4 % / 94.2 % and the precision of 41.5 %/95.4 % for un-
performed /performed conflict removal. Our arrow detector
performs 106.5 stroke classifications in average per diagram
while searching for arrow heads while there are 10 arrows
in average per diagram.
4.1.2 Diagram recognition pipeline test
We embedded our arrow detector into the diagram recogni-
tion pipeline proposed earlier [4] and made experiments on
the FA and FC databases. The FC database does not contain
annotation of arrow heads and shafts. Therefore, we used
the arrow head classifier trained on the FA database in both
cases. The results are shown in Tables 2, 3. Although there
is an improvement in both domains, it is more significant
in the FA domain. The recognition accuracy increased in
all symbol classes, which shows that misrecognized arrows
can cause further errors in classification of other symbols.
Class
Correct stroke Correct symbol segmentation
labeling [%] and recognition [%]
Previous Proposed Previous Proposed
Arrow 89.3 94.9 84.4 92.8
Arrow in 78.5 85.0 80.0 84.0
Final state 96.1 99.2 93.8 98.4
State 95.2 96.9 94.5 97.2
Label 99.1 99.8 96.0 99.1
Total 94.5 97.4 91.5 96.4
Table 2. Diagram recognition results for the FA domain.
Class
Correct stroke Correct symbol segmentation
labeling [%] and recognition [%]
Previous Proposed Previous Proposed
Arrow 85.3 88.7 74.4 78.1
Connection 93.3 94.1 93.6 95.1
Data 95.6 96.4 88.8 90.6
Decision 90.8 90.9 74.1 75.3
Process 93.7 95.2 87.2 88.1
Terminator 89.7 90.2 88.1 88.9
Text 99.0 99.3 87.9 89.7
Total 95.2 96.5 82.8 84.43
Table 3. Diagram recognition results for the FC domain.
4.2. Accent
The Accent database consists of pairs of reference and
argument strokes. The task is to classify the argument
strokes into 18 graphic gestures. Two of them correspond
to the addition of a stroke to a character. The 16 others (see
Figure 6) correspond to an accentuation of their reference
character (acute, grave, cedilla, etc.), to a punctuation sym-
bol (coma, dot, apostrophe, etc.) or to an editing gesture
(space, caret return, etc.). As several subsets of gestures
have the same shape, the only way to discriminate them is
to use spatial context – their relative position. The exam-
ples of the benchmark have been written on a PDA by 14
writers. The training database contains 4243 examples of 8
writers and the test database contains 2393 examples of 6
writers. None of the writers is common to both data sets.
Figure 6. Classes of the argument strokes in ACCENT database.
To apply our method, we set a center of each reference
stroke’s bounding box as a reference point and extracted
features. We tried LSTM and BLSTM RNNs the same
way as in the case of the Arrow database. However, we
achieved the precision of 91.9 % only. It turned out that
our features have a problem to distinguish very small ar-
gument strokes like acute, apostrophe, or dieresis. These
strokes often consist just of one single point. Therefore,
we decided to enrich the set of features and add local fea-
tures describing the appearance of strokes. We used four
features introduced by Otte et al. [13]: an index of the point
to distinguish long and short strokes, sine and cosine of the
angle between the current and the last line segment (zero
for extreme points), and sum of lengths of the current and
the previous line segments. Let us note that the point in-
dices and distances are normalized (3). We refer to the two
sets of features and associated experiments as basic and ex-
tended. We achieved the best precision with the extended
features and the BLSTM RNN with 32 nodes in the hidden
layer, which was 93.6 %. The training was done again with
the learning rate of 0.001 and the momentum of 0.9. The
ROC curves and time complexities are shown in Figure 7.
In the case of the method of Bouteruche et al., we used
our reimplementation and made the experiments. We con-
firm the results they stated – the precision of 95.75 %.
5. Conclusions
We have shown how important and difficult task is the
arrow recognition for the whole process of diagram recog-
nition. We designed an arrow recognizer, which detects ar-
rows in two steps: a) detection of an arrow’s shaft, b) detec-
tion of an arrow’s head. First step is easy, because the search
for a shaft is guided by detected symbols connected by the
arrow. For the second step, we proposed a novel arrow head
2 4 8 16 32 64
10
20
30
40
50
60
70
80
90
100
number of nodes in the hidden layer [−]
precision [%]
Precision of RNNs
Basic LSTM
Basic BLSTM
Extended LSTM
Extended BLSTM
2 4 8 16 32 64
0
0.5
1
1.5
2
2.5
3
3.5
number of nodes in the hidden layer [−]
time [ms]
Time needed to classify one sample
Basic LSTM
Basic BLSTM
Extended LSTM
Extended BLSTM
Figure 7. Dependency of precision and running time on the num-
ber of nodes in the hidden layer of RNNs for ACCENT database.
classifier based on relative stroke positioning. We presented
a classification method based on low-level features using
(B)LSTM RNNs. We embedded the proposed arrow detec-
tor into diagram recognition pipeline and we increased the
accuracy of the state-of-the-art diagram recognizer on the
benchmark databases of finite automata and flowcharts.
We have also made the comparison with the state-of-the-
art method for relative positioning method. This method is
unable to solve the proposed task adequately and reaches
the inferior precision. However, we have made the compar-
ison on the task for which this method was developed and
it shows that our method gives slightly worse results in that
case. It implies that the fuzzy positioning might be a good
solution for some sort of tasks (data), but it is not a gen-
eral tool. On the other hand, our method seems to be more
general since it gave relatively good results in both cases.
Even in the case it gives slightly worse results it might be
a good alternative thanks to its simplicity and fast feature
extraction.
Acknowledgment
The first author was supported by the Grant Agency of
the CTU under the project SGS13/205/OHK3/3T/13. The
second and the third authors were supported by the Grant
Agency of the Czech Republic under Project P103/10/0783
and the Technology Agency of the Czech Republic under
Project TE01020197 Center Applied Cybernetics, respec-
tively.
References
[1] A.-M. Awal, G. Feng, H. Mouchere, and C. Viard-Gaudin.
First experiments on a new online handwritten flowchart
database. In DRR 2011, pages 1–10, 2011.
[2] F. Bouteruche, S. Mac´
e, and E. Anquetil. Fuzzy relative po-
sitioning for on-line handwritten stroke analysis. In Proceed-
ings of IWFHR 2006, pages 391–396, 2006.
[3] M. Bresler, D. Pr˚
uˇ
sa, and V. Hlav´
aˇ
c. Modeling flowchart
structure recognition as a max-sum problem. In Proceedings
of ICDAR 2013, pages 1247–1251, August 2013.
[4] M. Bresler, T. V. Phan, D. Pr˚
uˇ
sa, M. Nakagawa, and
V. Hlav´
aˇ
c. Recognition system for on-line sketched dia-
grams. In Proceedings of ICFHR 2014, pages 563–568,
September 2014.
[5] C. Carton, A. Lemaitre, and B. Couasnon. Fusion of statis-
tical and structural information for flowchart recognition. In
Proceedings of ICDAR 2013, pages 1210–1214, 2013.
[6] A. Delaye and E. Anquetil. Fuzzy relative positioning tem-
plates for symbol recognition. In Proceedings of ICDAR
2011, pages 1220–1224, September 2011.
[7] A. Delaye, S. Mac´
e, and E. Anquetil. Modeling Relative
Positioning of Handwritten Patterns. In Proceedings of IGS
2009, pages 122–127, 2009.
[8] M. El Meseery, M. El Din, S. Mashali, M. Fayek, and N. Dar-
wish. Sketch recognition using particle swarm algorithms. In
Proceedings of ICIP 2009, pages 2017 – 2020, 2009.
[9] G. Feng and C. Viard-Gaudin. Stroke fragmentation based
on geometry features and HMM. CoRR, 2008.
[10] Heaton Research, Inc. Encog Machine Learning Framework,
2013. http://www.heatonresearch.com/encog.
[11] J. Herold and T. F. Stahovich. Classyseg: A machine learning
approach to automatic stroke segmentation. In Proceedings
of SBIM 2011, pages 109–116, 2011.
[12] S. Otte, D. Krechel, and M. Liwicki. JANNLab Neural Net-
work Framework for Java. In Proceedings of MLDM 2013,
pages 39–46, 2013.
[13] S. Otte, D. Krechel, M. Liwicki, and A. Dengel. Local
feature based online mode detection with recurrent neural
networks. In Proceedings of ICFHR 2012, pages 531–535,
2012.
[14] A. Stoffel, E. Tapia, and R. Rojas. Recognition of on-line
handwritten commutative diagrams. In Proceedings of IC-
DAR 2009, pages 1211–1215, 2009.
[15] A. Wolin, B. Paulson, and T. Hammond. Sort, merge, re-
peat: An algorithm for effectively finding corners in hand-
sketched strokes. In Proceedings of SBIM 2009, pages 93–
99, 2009.