ArticlePDF Available

Abstract and Figures

This paper deals with recognition of arrows in online sketched diagrams. Arrows have varying appearance and thus it is a difficult task to recognize them directly. It is beneficial to detect arrows after other symbols (easier to detect) are already found. We proposed [4] an arrow detector which searches for arrows as arbitrarily shaped connectors between already found symbols. The detection is done two steps: a) a search for a shaft of the arrow, b) a search for its head. The first step is relatively easy. However, it might be quite difficult to find the head reliably. This paper brings two contributions. The first contribution is a design of an arrow recognizer where the head is detected using relative strokes positioning. We embedded this recognizer into the diagram recognition pipeline proposed earlier [4] and increased the overall accuracy. The second contribution is an introduction of a new approach to evaluate the relative position of two given strokes with neural networks (LSTM). This approach is an alternative to the fuzzy relative positioning proposed by Bout ruche et al. [2]. We made a comparison between the two methods through experiments performed on two datasets for two different tasks. First, we used a benchmark database of hand-drawn finite automata to evaluate detection of arrows. Second, we used a database presented in the paper by Bout ruche et al. containing pairs of reference and argument strokes, where argument strokes are classified into 18 classes. Our method gave significantly better results for the first task and comparable results for the second task.
Content may be subject to copyright.
Detection of Arrows in On-line Sketched Diagrams
using Relative Stroke Positioning
Martin Bresler, Daniel Pr˚
uˇ
sa, V´
aclav Hlav´
aˇ
c
Czech Technical University in Prague, Faculty of Electrical Engineering,
Department of Cybernetics, 166 27, Praha 6, Technick´
a 2, Czech Republic
{breslmar, prusapa1, hlavac}@cmp.felk.cvut.cz
Abstract
This paper deals with recognition of arrows in online
sketched diagrams. Arrows have varying appearance and
thus it is a difficult task to recognize them directly. It is ben-
eficial to detect arrows after other symbols (easier to de-
tect) are already found. We proposed [4] an arrow detector
which searches for arrows as arbitrarily shaped connectors
between already found symbols. The detection is done two
steps: a) a search for a shaft of the arrow, b) a search for
its head. The first step is relatively easy. However, it might
be quite difficult to find the head reliably. This paper brings
two contributions. The first contribution is a design of an
arrow recognizer where the head is detected using relative
strokes positioning. We embedded this recognizer into the
diagram recognition pipeline proposed earlier [4] and in-
creased the overall accuracy. The second contribution is an
introduction of a new approach to evaluate the relative posi-
tion of two given strokes with neural networks (LSTM). This
approach is an alternative to the fuzzy relative positioning
proposed by Bouteruche et al. [2]. We made a comparison
between the two methods through experiments performed on
two datasets for two different tasks. First, we used a bench-
mark database of hand-drawn finite automata to evaluate
detection of arrows. Second, we used a database presented
in the paper by Bouteruche et al. containing pairs of ref-
erence and argument strokes, where argument strokes are
classified into 18 classes. Our method gave significantly
better results for the first task and comparable results for
the second task.
1. Introduction
This paper deals with on-line handwriting recognition,
where the input consists of a sequence of strokes. A stroke
is a sequence of points captured by an ink-input device (the
most commonly a tablet or a tablet PC) as the user was writ-
ing with a stylus or his finger. In handwriting recognition,
the research has already moved from recognition of plain
text to recognition of a more structured input as diagrams.
This work is focused on recognition of arrows in on-line
sketched diagrams.
Arrows are the most important symbols in diagrams,
since they bear the most valuable information about the
diagram structure – what symbols are connected together.
However, it is a difficult task to recognize them because
of their varying appearance. We consider two diagram do-
mains – finite automata (FA) and flowcharts (FC). There is
a freely available benchmark database available for each of
the domains: the FA database [4] and the FC database [1].
Figure 1 shows examples of diagrams from these two do-
mains. It is obvious that arrows can be arbitrarily directed
and their shafts might be straight lines, curved lines, or
polylines. Moreover, their heads have a different shape.
There exists an approach, where arrows are detected first
and the knowledge of arrows helps to naturally segment the
rest of the symbols [14]. The problem is that authors of this
approach put very strict requirements on the way the arrow
is drawn. It must consist of one or two strokes and the ar-
row’s head must have only one predefined shape. Another
approach is to detect arrows the same way as other sym-
bols – using a classifier based on the symbol appearance.
Since the arrows might be arbitrarily rotated and the heads
might have different shapes, it is necessary to create several
arrow sub-classes. This approach is more general, but the
achieved accuracy is limited. The state-of-the-art methods
in flowchart recognition achieve always very small accu-
racy in arrow recognition [5, 3]. We already suggested [4]
that it is better to detect arrows after the other symbols
are detected. We proposed an algorithm, which searches
for arrows as arbitrarily shaped connectors between already
found non-arrow symbols. It works in two stages: a) ar-
row shaft detection, b) arrow head detection. The detection
of arrow head is based on heuristics and does not achieve
satisfactory precision. In this paper, we employ machine
learning to improve the proposed arrow detector with arrow
head classifier based on relative strokes positioning.
(a) (b)
Figure 1. Examples of hand-drawn diagrams containing arrows connecting symbols with rigid bodies: (a) finite automata, (b) flowchart.
In many cases, appearance does not give us enough in-
formation to classify single strokes and we need some con-
textual information. Relative position of a stroke with re-
spect to a reference stroke is the most intuitive. Bouteruche
et al. [2] addressed this problem directly and proposed a
fuzzy relative positioning method. The authors introduced
a method evaluating the relative position of strokes based
on the fact how pairs of strokes fulfil a set of relations such
as ”the second stroke is on the right of the first stroke”
through defined fuzzy landscapes. They used this method
to solve a prepared task, where pairs of reference and ar-
gument strokes are given and the argument strokes have to
be classified into 18 classes corresponding to several types
of accentuation or punctuation. The information about the
appearance and the relative position of the argument stroke
with respect to the reference stroke must be combined to-
gether to achieve a good recognition rate. This task ade-
quately demonstrates the need for relative positioning sys-
tem. They used Radial Basis Function Networks (RBFN)
as a classifier. The method was further improved by a better
definition of fuzzy landscapes and using SVM by Delaye et
al. [7]. Although the fuzzy relative positioning is a power-
ful method useful for more complex tasks as recognition of
structured handwritten symbols (Chinese characters) [6], it
gives poor results when applied on arrow head detection.
Our work brings two contributions. First, we define ar-
row head detection as a classification of possible arrow head
strokes based on relative positioning. We used this arrow
head classifier to significantly improve proposed arrow de-
tector. Second, we propose a new method for evaluation
of the relative position of strokes, which exploits simple
low-level features and uses Bidirectional Long Short Term
Memory (BLSTM) Recurrent Neural Network (RNN) as a
classifier. The BLSTM RNN proved to be a good tool for
classification of individual strokes [13].
The rest of the paper is organized as follows. Section 2
describes the proposed arrow detector and the way the rel-
ative positioning is exploited to determine which strokes
represent the head of the arrow. Section 3 introduces our
method for evaluation of the relative position. Experiments
and their results are described in Section 4. Finally, we
make a conclusion in Section 5.
2. Arrow detector
Arrows are symbols with a non-rigid body. They consist
of two parts: shaft and head. The head defines the orien-
tation of the arrow. However, arrow’s appearance can be
changing arbitrarily according to the given domain. They
can have various shapes, lengths, heads, and directions.
Therefore, it is a difficult task to detect arrows with or-
dinary classifiers based on symbol appearance. However,
each arrow connects two other symbols with a rigid body
(see Figure 1). It is beneficial to detect these symbols first
and leave the arrow detection to another classifier detecting
arrows between pairs of these symbols. This new classifier
must perform the following two steps:
1. Find a shaft of the arrow connecting the given two
symbols. This shaft is just a sequence of strokes lead-
ing from a vicinity of the first symbol to a vicinity of
the second symbol and it is undirected.
2. Find a head of the arrow, which is located around one
of the end-points of the shaft. The head defines orien-
tation of the arrow (if it is heading from the first sym-
bol to the second symbol or vice versa).
The detection of an arrow’s shaft can be done iteratively
by simply adding strokes to a sequence such that the first
stroke starts in a vicinity of the first symbol and the last
stroke ends in a vicinity of the second symbol. A new stroke
is added to the sequence only if the distance between the
end-point of the last stroke and the end-point of the new
stroke is smaller than a threshold. The algorithm must con-
sider all possible combinations of strokes creating a valid
connection between the given two symbols. The search
Detection of arrow
shaft
Extraction of
reference strokes
and points
Query strokes
search and
classification
Query strokes
search and
classification
Selection of the best
arrow head
Ref.point A
Ref. point B
Head A
Head B
Arrow
Shaft
Pairs of symbols
Figure 2. Arrow recognition pipeline. The recognition process is illustrated on a simple example of two symbols from FC domain.
space can be reasonably reduced by setting a maximal num-
ber of strokes in the sequence. This number depends on the
domain and the fact, how many strokes users use to draw ar-
row’s shafts. Typically, it is four and two for flowcharts and
finite automata, respectively. We can immediately remove
some shafts, which are in a conflict with another shafts,
and keep those with the smallest sum of the following dis-
tances: a) distance between the first symbol and the first
stroke of the shaft, b) distance between the second symbol
and the last stroke of the shaft, c) distances between indi-
vidual strokes of the shaft.
Since we do not know the orientation of the arrow yet
and the shaft is undirected, we have to consider both end-
points of the shaft and try to find two heads (one in the
vicinity of each end-point). Ideally we will be able to find
just one head. In practice, it can happen that we find two
heads and we have to decide which one is better. The de-
tection of an arrow’s head is not a trivial task, because there
might be a lot of interfering strokes around the end-points
of the shaft: heads of another arrows or text. The deci-
sion which strokes represent the true arrow’s head we are
looking for and which are not, is a task, where the stroke
positioning might be beneficially used. First, we define a
reference stroke (a sub-stroke of the shaft) and a reference
point (end-point of the shaft), which are used to express
a relative position of query strokes (details follow in Sec-
tion 2.1). Second, this information about relative position is
given to a classifier making the decision. The query strokes
are all strokes in a vicinity of a given end-point of the shaft,
which are not a part of the shaft itself nor the two given
symbols. We make a classification into two classes: head
and not-head. Explanation for the evaluation of the relative
position of strokes and classification is given in Section 3.
Let us just note that the classifier returns a class into which
the query stroke is classified along with a potential. We use
this potential to decide which head is of better quality in the
case we find two. We just compute a sum of potentials of all
strokes in each head and decide for the head with the big-
ger value. This slightly favours heads consisting of higher
number of strokes, which is desirable in the most cases. A
pseudocode for the algorithm that we just described is di-
vided into two procedures and presented in the supplemen-
tary material as Algorithm 1 and Algorithm 2. The arrow
recognition pipeline is depicted in Figure 2.
It happens quite often that the user draws a shaft and a
head of an arrow by one stroke. Our algorithm would fail
in that case. Therefore, we make one important step before
we try to find the arrow’s head – we segment the last stroke
of the shaft into smaller sub-strokes in such a way that the
head is split from the shaft. Created sub-strokes are divided
into two groups. One group is used to finish the shaft again
such that it reaches the symbol again. Sub-strokes of the
second group are put into the set of query strokes possibly
forming the head. Our splitting algorithm is described in
Section 2.2. If the shaft and the head are not drawn by one
stroke, the algorithm will ideally perform no segmentation
and this step can be skipped.
2.1. Reference stroke and reference point
It is necessary to define a reference stroke. Position of all
query strokes will be evaluated relatively with respect to it.
Naturally, it seems that the arrow’s shaft should be the refer-
ence stroke. However, it is better to use just a sub-stroke of
the shaft for this purpose. The reason is that the shaft might
be arbitrarily curved or refracted, the whole arrow might
be arbitrarily rotated, and we want to normalize the input
in such a way that the reference stroke has always more or
less the same appearance and the query strokes have always
more or less the same relative position. Therefore, we cre-
ate a sub-stroke beginning at the end-point of the shaft with
a shape of a line segment. It is done iteratively by adding
points to the newly created stroke until the value of a crite-
rion, expressing how similar is the stroke to a line, is bigger
than a threshold. The criterion is a ratio of the distance be-
tween the end-points of the stroke and the path length of the
stroke (sum of distances between neighbouring points). We
set the threshold empirically to 0.95. Another condition is
that the distance between end-points of the stroke must be
bigger then a threshold empirically derived from the aver-
age length of strokes, because the possible presence of so
called hooks at ends of strokes would cause small value of
the criterion for short strokes. Figure 3 illustrates how the
reference stroke is determined as a sub-stroke of the shaft.
Then we rotate the reference stroke and all query strokes
by such an angle that the vector given by the end-points of
the reference stroke will be pointing in the direction of the
x-axis. In another words, it will cause that the true arrow
heads should point from the left to the right. For purposes
of our method for evaluation of relative position of strokes
(described in Section 3), we have to define a reference point.
Obviously, it is the end-point of the shaft.
(a)
(b) (c)
(d) (e)
Figure 3. Example showing a diagram and the way of choosing
the reference point, the reference stroke, and the rotation. Indi-
vidual pictures illustrates the following: (a) whole diagram with a
highlighted (red) arrow to be detected, (b) detected arrow’s shaft
is blue and right end-point is considered to be the reference point,
second point is green, the angle αused to rotate query strokes is
marked, (c) rotation is done, the reference point is red as well as
strokes of the real arrow’s head, (d) analogously to (b) with the
other end-point considered, (e) analogously to (c) with exception
that there is no real head, because the arrow’s orientation is wrong.
Because we still do not know the orientation of the arrow,
we have to consider both options: the arrow is heading to
the first symbol or the second symbol. Therefore, we define
two reference points, end-points of the shaft. A reference
(sub)stroke is associated to each of these two points then.
Figure 3 shows the whole process of the reference stroke
extraction and rotation.
2.2. Stroke segmentation
Stroke segmentation is very important field of research,
because it is frequently used preprocessing step. Therefore,
there exist various papers dealing with this problem. The
segmentation is done by defining a set of splitting points.
The substantial information is curvature and speed defined
at each point and geometric properties of stroke segments.
The common approach is to find tentative splitting points
with high curvature and low speed. The best subset of
these points is selected according to the error function fit-
ting points of each segment into selected primitives. The
most common primitives are line segments and arcs [8, 15].
It is also possible to use machine learning to train a classifier
detecting the splitting points [9, 11].
The presented algorithms are sophisticated and allow to
find segments fitting predefined primitives. However, us-
ing any of these methods seems to be an overkill for our
task. We do not require to split a stroke at any precisely de-
fined point nor to create segments with particular geomet-
rical properties (line segments or arcs). All we need is to
split the arrow’s head from its body and it is not important
if both the body and the head will be further split into sev-
eral segments. Therefore, we suggest to use much simpler
algorithm for stroke segmentation. Its description follows.
We compute a value AA, which we call “accumu-
lated angle”, associated to each point of the stroke S=
{p1, p2, . . . , pn}according to the following equation:
AAi= mean(Rank3{A(i, 1),...,A(i, min(i1, ni, R))}),
(1)
where iis the index of the point in the sequence, Rank3
is an operator choosing up to the three smallest values of
a given set, Ris the maximal radius, and Ais a function
computing an angle between two vectors defined by the in-
dex of the given reference point and its two neighbouring
points chosen by the size of the radius. The function Ais
defined as follows:
A(i, r) = arccos
pipir·
pipi+r
kpipirk·kpipi+rk.(2)
Let us note that AAiis computed according to Equation (1)
only for i∈ {2, . . . , n 1}and AA1=AAn= 0. We
define the initial set of splitting points by taking points
where the AA reached a local minima and the value is
smaller than mCoeff ·mean{AA1, . . . , AAn}. In the case
that there are two splitting points too close to each other
(dist(pi, pj)<distThresh), we remove one with smaller
AA value. We set mCoeff = 0.5and distThresh = 200
empirically. After this removal, the segmentation is done.
We tested the described algorithm on arrows from the FA
database (see Section 4.1) which were drawn by one stroke
and it turned out that the algorithm split the head from the
body in 100% of cases. Let us emphasize that parameters
mCoeff and distThresh are tunable. It makes it easy to
adjust for demands of a given task.
3. Evaluation of relative position of strokes
Unlike the method by Bouteruche et. al, where a query
stroke is evaluated with respect to the whole reference
α1
α2
αn
d1d2
dn
x
p1
p2
pn
R
(a) Arrow domain.
pn
p1
d1
dn
α1
αn
x
R
(b) Accent domain.
Figure 4. Example showing pairs of reference and query strokes and extracted sequences of features (angles and distances) for both
domains. Reference point R is marked red. In the case of arrow domain, both, the reference and the query, strokes are already rotated. The
query stroke is a sequence of points {p1, p2,...,pn}.
stroke by evaluating its fuzzy structuring element, we pro-
pose to evaluate the relative position of a query stroke with
respect just to a single point of the reference stroke. In the
case of arrows, it is the end-point of the arrow’s shaft. In
the case of the task defined by Bouteruche et al., it can be
an arbitrary fix point. We propose to choose a center of the
reference stroke’s bounding box.
We are given a reference stroke, which is represented by
its reference point Rand a query stroke Sdefined by a se-
quence of its points: S={p1, p2, . . . , pn}. To describe the
relative position of Swith respect to R, we express relative
position of each point piusing polar coordinates. Position
of each point is defined by the angle αi=
Rpi
xand the
distance di=kRpik. We create a sample for each pair of
a reference and query strokes consisting of a sequence of
the described features {[α1, d1],[α2, d2],...,[αn, dn]}and
a label indicating the class of the query stroke. For illus-
tration, see Figure 4. We propose to use (B)LSTM RNN as
a classifier, because it reaches the best results in many ap-
plications. However, it is possible to use different tools for
classifying sequences (e.g. Hidden Markov Models). When
dealing with neural networks, it make sense to normalize
inputs:
ˆvk=vkmk
σk
,(3)
where vkis an input value, ˆvkis the normalized value, mk
and σkare the mean and the standard deviation of all values
of the same feature from the training database, respectively.
We use this normalization to normalize the distance only.
The advantage of proposed features is the fact that they
are simple and easy to extract (low time complexity). More-
over, they express relative position of the query stroke with
respect to the reference point as well as the shape of the
query stroke. It is possible to reconstruct the trajectory of
the query stroke from the sequence of features. It leads to
simple implementation and fast evaluation.
4. Experiments
We made experiments on two tasks. The first one is the
task defined in this paper – classification of strokes into two
classes head,not-head when the reference stroke is a part
of the arrow’s shaft. The second task is to classify argument
strokes representing accentuation or punctuation of its refer-
ence strokes into 18 classes. The task as well as the database
called ACCENT was proposed by Bouteruche et al. In the
case of arrows we additionally evaluated the whole process
of arrows detection, where the stroke classification is a sub-
task. We used both positioning methods to solve both tasks
and we made a comparison. All experiments were done on
a standard tablet PC Lenovo X230 (Intel Core i5 2.6 GHz,
8GB RAM) with 64-bit Windows 7 operating system.
4.1. Arrows
We used the FA database for this experiment. The ver-
sion 1.1 contains the annotation of heads and shafts of ar-
rows. We extracted a reference point and stroke for each
arrow as described in Section 2.1. The only difference is
that the shaft is known from the annotation. We created
a set of query strokes and rotated these strokes according
to the reference stroke. We extracted features with respect
to the reference point or the reference stroke depending on
the used method for each query stroke and assigned a label
based on the annotation from the database. We refer to the
samples with the label head as positive and those with the
label not-head as negative samples. The FA database con-
sists of 12 diagram patterns drawn by several users and it is
split into training and test dataset. The training dataset con-
tains diagrams from 11 users (132 diagrams) and the test
dataset diagrams from 7 users (84 diagrams). Each dia-
gram is formed of 54 strokes and contains 5 symbols and
10 arrows in average. We extracted 1480/834 positive and
1263/1019 negative samples from the training/test dataset.
Arrows drawn by one stroke are manually segmented in the
database. However, to demonstrate our segmentation algo-
rithm from Section 2.2, we created a second test dataset (ref.
as test2), where we further segmented query strokes. Ob-
tained sub-strokes created new samples with the same label
as the original ones. We used this dataset to show that possi-
ble oversegmentation will not lower the final precision. We
created 1252 positive and 1876 negative examples this way.
For our method, we used LSTM and BLSTM RNNs im-
plemented within the library JANNLab [12]. We tried dif-
ferent numbers of nodes in the hidden layer to get the best
performance. We always trained the network in 200 epochs
with the following parameters: learning rate 0.001, momen-
tum 0.9. We achieved the best overall precision of 99.9%
with the BLSTM RNN with 32 nodes in the hidden layer.
However, it might be important to find a trade-off between
precision and time complexity and thus it might be better
to use the LSTM RNN with only 8 nodes in the hidden
layer, because it is significantly faster. It gives the precision
of 99.6 % and the average time needed for classification is
0.79 ms. For details, refer to Figure 5. The best achieved
precision for individual classes are given in Table 1. The
achieved precision on the test2 with the best trained neural
network was not decreased and reached 99.9 %.
For the method of Bouteruche et al., we used a RBFN
implemented within the library Encog [10]. We set the
number of the nodes in the hidden layer to be a power of
the number of features, which leads to equally spaced RBF
centers. It is the setting qiving the best performance. We
tried two sets of features proposed by Bouteruche et al. re-
ferred in their paper by numbers 4 and 5 and we achieved
the accuracy of 95.4 % and 88.2 %, respectively. It is not
surprising that the feature set number 5 reached much worse
results. It contains features expressing how much a query
stroke fits into structuring elements of all classes. However,
in this case, we have just two classes and the class of nega-
tive samples contains arbitrarily shaped strokes and thus the
structuring elements are too wide. We also implemented the
method by Delaye et al. [7]. Their filtered fuzzy landscape
is an improvement of the Bouteruche’s feature set 5 and thus
it gives rather low precision for the very same reason. The
feature set number 4 gives much better results. However, it
was still inferior in comparison with our method – the best
overall precision of 95.36 %. For more detailed results see
again Table 1.
Since we use RNN in our method, the classification has
higher time complexity (especially with increasing com-
plexity of the net). The classification made by a RBFN is
indeed very fast. On the other hand, it is much faster to
extract the low level features we use: 0.016 ms per sample.
Feature extraction is slower in the case of fuzzy positioning:
2.89 ms per sample for the feature set number 4 and 0.99 ms
per sample for the feature set number 5.
2 4 8 16 32
96
96.5
97
97.5
98
98.5
99
99.5
100
number of nodes in the hidden layer []
precision [%]
Precision of RNNs
LSTM
BLSTM
2 4 8 16 32
0
1
2
3
4
5
6
7
number of nodes in the hidden layer []
time [ms]
Time needed to classify one sample
LSTM
BLSTM
Figure 5. Dependency of precision and time complexity on the
number of nodes in the hidden layer of RNNs for the FA database.
Method positive negative overall
Ours 99.91 % 99.85 % 99.88 %
Bouteruche et al. (4) 98.56 % 92.75 % 95.36 %
Bouteruche et al. (5) 94.24 % 83.32 % 88.24 %
Delaye et al. 95.17 % 86.07 % 90.17 %
Table 1. Comparison of precisions for arrow heads detection.
4.1.1 Arrow detector test
We took all annotated symbols with rigid bodies and
tried to find arrows with the arrow detector we proposed
(query strokes for arrow heads were classified with our best
BLSTM RNN). We compared the detected arrows with an-
notated arrows. Let us remind that all pairs of symbols were
considered. Conflicting arrow shafts were removed imme-
diately. However, adding arrow heads may cause another
conflicts. The result of the arrow detector is a list of ar-
row candidates and a structural analysis should be done to
solve the conflicts. However, we tried to remove conflicts
by simply keeping arrows with higher confidence to see how
it affects recall and precision. The test dataset of the FA
database contains 796 arrows. We achieved the recall of
95.4 % / 94.2 % and the precision of 41.5 %/95.4 % for un-
performed /performed conflict removal. Our arrow detector
performs 106.5 stroke classifications in average per diagram
while searching for arrow heads while there are 10 arrows
in average per diagram.
4.1.2 Diagram recognition pipeline test
We embedded our arrow detector into the diagram recogni-
tion pipeline proposed earlier [4] and made experiments on
the FA and FC databases. The FC database does not contain
annotation of arrow heads and shafts. Therefore, we used
the arrow head classifier trained on the FA database in both
cases. The results are shown in Tables 2, 3. Although there
is an improvement in both domains, it is more significant
in the FA domain. The recognition accuracy increased in
all symbol classes, which shows that misrecognized arrows
can cause further errors in classification of other symbols.
Class
Correct stroke Correct symbol segmentation
labeling [%] and recognition [%]
Previous Proposed Previous Proposed
Arrow 89.3 94.9 84.4 92.8
Arrow in 78.5 85.0 80.0 84.0
Final state 96.1 99.2 93.8 98.4
State 95.2 96.9 94.5 97.2
Label 99.1 99.8 96.0 99.1
Total 94.5 97.4 91.5 96.4
Table 2. Diagram recognition results for the FA domain.
Class
Correct stroke Correct symbol segmentation
labeling [%] and recognition [%]
Previous Proposed Previous Proposed
Arrow 85.3 88.7 74.4 78.1
Connection 93.3 94.1 93.6 95.1
Data 95.6 96.4 88.8 90.6
Decision 90.8 90.9 74.1 75.3
Process 93.7 95.2 87.2 88.1
Terminator 89.7 90.2 88.1 88.9
Text 99.0 99.3 87.9 89.7
Total 95.2 96.5 82.8 84.43
Table 3. Diagram recognition results for the FC domain.
4.2. Accent
The Accent database consists of pairs of reference and
argument strokes. The task is to classify the argument
strokes into 18 graphic gestures. Two of them correspond
to the addition of a stroke to a character. The 16 others (see
Figure 6) correspond to an accentuation of their reference
character (acute, grave, cedilla, etc.), to a punctuation sym-
bol (coma, dot, apostrophe, etc.) or to an editing gesture
(space, caret return, etc.). As several subsets of gestures
have the same shape, the only way to discriminate them is
to use spatial context – their relative position. The exam-
ples of the benchmark have been written on a PDA by 14
writers. The training database contains 4243 examples of 8
writers and the test database contains 2393 examples of 6
writers. None of the writers is common to both data sets.
Figure 6. Classes of the argument strokes in ACCENT database.
To apply our method, we set a center of each reference
stroke’s bounding box as a reference point and extracted
features. We tried LSTM and BLSTM RNNs the same
way as in the case of the Arrow database. However, we
achieved the precision of 91.9 % only. It turned out that
our features have a problem to distinguish very small ar-
gument strokes like acute, apostrophe, or dieresis. These
strokes often consist just of one single point. Therefore,
we decided to enrich the set of features and add local fea-
tures describing the appearance of strokes. We used four
features introduced by Otte et al. [13]: an index of the point
to distinguish long and short strokes, sine and cosine of the
angle between the current and the last line segment (zero
for extreme points), and sum of lengths of the current and
the previous line segments. Let us note that the point in-
dices and distances are normalized (3). We refer to the two
sets of features and associated experiments as basic and ex-
tended. We achieved the best precision with the extended
features and the BLSTM RNN with 32 nodes in the hidden
layer, which was 93.6 %. The training was done again with
the learning rate of 0.001 and the momentum of 0.9. The
ROC curves and time complexities are shown in Figure 7.
In the case of the method of Bouteruche et al., we used
our reimplementation and made the experiments. We con-
firm the results they stated – the precision of 95.75 %.
5. Conclusions
We have shown how important and difficult task is the
arrow recognition for the whole process of diagram recog-
nition. We designed an arrow recognizer, which detects ar-
rows in two steps: a) detection of an arrow’s shaft, b) detec-
tion of an arrow’s head. First step is easy, because the search
for a shaft is guided by detected symbols connected by the
arrow. For the second step, we proposed a novel arrow head
2 4 8 16 32 64
10
20
30
40
50
60
70
80
90
100
number of nodes in the hidden layer []
precision [%]
Precision of RNNs
Basic LSTM
Basic BLSTM
Extended LSTM
Extended BLSTM
2 4 8 16 32 64
0
0.5
1
1.5
2
2.5
3
3.5
number of nodes in the hidden layer []
time [ms]
Time needed to classify one sample
Basic LSTM
Basic BLSTM
Extended LSTM
Extended BLSTM
Figure 7. Dependency of precision and running time on the num-
ber of nodes in the hidden layer of RNNs for ACCENT database.
classifier based on relative stroke positioning. We presented
a classification method based on low-level features using
(B)LSTM RNNs. We embedded the proposed arrow detec-
tor into diagram recognition pipeline and we increased the
accuracy of the state-of-the-art diagram recognizer on the
benchmark databases of finite automata and flowcharts.
We have also made the comparison with the state-of-the-
art method for relative positioning method. This method is
unable to solve the proposed task adequately and reaches
the inferior precision. However, we have made the compar-
ison on the task for which this method was developed and
it shows that our method gives slightly worse results in that
case. It implies that the fuzzy positioning might be a good
solution for some sort of tasks (data), but it is not a gen-
eral tool. On the other hand, our method seems to be more
general since it gave relatively good results in both cases.
Even in the case it gives slightly worse results it might be
a good alternative thanks to its simplicity and fast feature
extraction.
Acknowledgment
The first author was supported by the Grant Agency of
the CTU under the project SGS13/205/OHK3/3T/13. The
second and the third authors were supported by the Grant
Agency of the Czech Republic under Project P103/10/0783
and the Technology Agency of the Czech Republic under
Project TE01020197 Center Applied Cybernetics, respec-
tively.
References
[1] A.-M. Awal, G. Feng, H. Mouchere, and C. Viard-Gaudin.
First experiments on a new online handwritten flowchart
database. In DRR 2011, pages 1–10, 2011.
[2] F. Bouteruche, S. Mac´
e, and E. Anquetil. Fuzzy relative po-
sitioning for on-line handwritten stroke analysis. In Proceed-
ings of IWFHR 2006, pages 391–396, 2006.
[3] M. Bresler, D. Pr˚
uˇ
sa, and V. Hlav´
aˇ
c. Modeling flowchart
structure recognition as a max-sum problem. In Proceedings
of ICDAR 2013, pages 1247–1251, August 2013.
[4] M. Bresler, T. V. Phan, D. Pr˚
uˇ
sa, M. Nakagawa, and
V. Hlav´
aˇ
c. Recognition system for on-line sketched dia-
grams. In Proceedings of ICFHR 2014, pages 563–568,
September 2014.
[5] C. Carton, A. Lemaitre, and B. Couasnon. Fusion of statis-
tical and structural information for flowchart recognition. In
Proceedings of ICDAR 2013, pages 1210–1214, 2013.
[6] A. Delaye and E. Anquetil. Fuzzy relative positioning tem-
plates for symbol recognition. In Proceedings of ICDAR
2011, pages 1220–1224, September 2011.
[7] A. Delaye, S. Mac´
e, and E. Anquetil. Modeling Relative
Positioning of Handwritten Patterns. In Proceedings of IGS
2009, pages 122–127, 2009.
[8] M. El Meseery, M. El Din, S. Mashali, M. Fayek, and N. Dar-
wish. Sketch recognition using particle swarm algorithms. In
Proceedings of ICIP 2009, pages 2017 – 2020, 2009.
[9] G. Feng and C. Viard-Gaudin. Stroke fragmentation based
on geometry features and HMM. CoRR, 2008.
[10] Heaton Research, Inc. Encog Machine Learning Framework,
2013. http://www.heatonresearch.com/encog.
[11] J. Herold and T. F. Stahovich. Classyseg: A machine learning
approach to automatic stroke segmentation. In Proceedings
of SBIM 2011, pages 109–116, 2011.
[12] S. Otte, D. Krechel, and M. Liwicki. JANNLab Neural Net-
work Framework for Java. In Proceedings of MLDM 2013,
pages 39–46, 2013.
[13] S. Otte, D. Krechel, M. Liwicki, and A. Dengel. Local
feature based online mode detection with recurrent neural
networks. In Proceedings of ICFHR 2012, pages 531–535,
2012.
[14] A. Stoffel, E. Tapia, and R. Rojas. Recognition of on-line
handwritten commutative diagrams. In Proceedings of IC-
DAR 2009, pages 1211–1215, 2009.
[15] A. Wolin, B. Paulson, and T. Hammond. Sort, merge, re-
peat: An algorithm for effectively finding corners in hand-
sketched strokes. In Proceedings of SBIM 2009, pages 93–
99, 2009.
... This input device captures the drawing as a temporal sequence of strokes. Online diagram recognition has received a lot of attention in research, especially in the area of flowcharts [1][2][3][4][5][6]9,14,18,36,37,40]. Yet, those approaches are of limited applicability if the original stroke data are not available (e.g., hand-drawn diagrams on paper). ...
... The dataset is publicly available, and its size has been increased to 419 flowcharts after the publication date. Following the release, several methods for online flowchart recognition were proposed [2][3][4][5][6]9,18,36,37,40]. Wu et al. [38] is the first work that uses FC_A for offline recognition. ...
... Since the evaluation does not attribute for stroke reconstruction errors, the result is not comparable with an image-based recognition system. Bresler, Průša, and Hlaváč published a series of works on online diagram recognition [2][3][4][5][6], and also introduced two online diagram datasets in the domains of finite automata (FA) and flowcharts (FC_B). The latest online system in [6] is a pipeline with text/non-text separation, symbol segmentation, symbol classification and structural analysis as its core parts. ...
Article
Full-text available
We address the problem of offline handwritten diagram recognition. Recently, it has been shown that diagram symbols can be directly recognized with deep learning object detectors. However, object detectors are not able to recognize the diagram structure. We propose Arrow R-CNN, the first deep learning system for joint symbol and structure recognition in handwritten diagrams. Arrow R-CNN extends the Faster R-CNN object detector with an arrow head and tail keypoint predictor and a diagram-aware postprocessing method. We propose a network architecture and data augmentation methods targeted at small diagram datasets. Our diagram-aware postprocessing method addresses the insufficiencies of standard Faster R-CNN postprocessing. It reconstructs a diagram from a set of symbol detections and arrow keypoints. Arrow R-CNN improves state-of-the-art substantially: on a scanned flowchart dataset, we increase the rate of recognized diagrams from 37.7 to 78.6%.
... Existing methods for online handwritten diagram recognition and interpretation can be roughly divided into two categories: bottom-up [4][5][6]15,27] and top-down ones [1,11,14,21,22]. Bottom-up approaches sequentially perform a symbol segmentation step and a recognition step. ...
... In handwritten diagrams, arrows are variable in appearance and are difficult to recognize compared to other symbols using identical classifier. Bresler et al. [4][5][6] proposed a new framework that strokes were firstly classified as text or nontext, then non-text strokes were clustered and uniform symbols were classified with SVM, lastly the arrows were detected. For structure analysis, they modeled whole flowchart excluding texts as a max-sum problem and applied integer linear programming to solve it [3]. ...
... Lemaitre [14] Bresler [4] Carton [7] Wang [22] Bresler [5] Wu [23] Wang [21] GAT GAT with EFA ...
Chapter
Handwritten text recognition has been extensively researched over decades and achieved extraordinary success in recent years. However, handwritten diagram recognition is still a challenging task because of the complex 2D structure and writing style variation. This paper presents a general framework for online handwritten diagram recognition based on graph attention networks (GAT). We model each diagram as a graph in which nodes represent strokes and edges represent the relationships between strokes. Then, we learn GAT models to classify graph nodes taking both stroke features and the relationships between strokes into consideration. To better exploit the spatial and temporal relationships, we enhance the original GAT model with a novel attention mechanism. Experiments on two online handwritten flowchart datasets and a finite automata dataset show that our method consistently outperforms previous methods and achieves the state-of-the-art performance.
... Detected arrows are assigned a score given by the quality of the shaft and the head. More details can be found in our original paper [6]. ...
... Unfortunately, this is not our case. However, the size of generated graphs is not 6 Input flowchart (a) consisting of eight strokes t 1 , . . . , t 8 forming three uniform symbols and two arrows. ...
... We tested IBM ILOG CPLEX library. 6 The conversion is based on linear programming relaxation of the problem [36]. ...
Article
Full-text available
We introduce a new, online, stroke-based recognition system for hand-drawn diagrams which belong to a group of documents with an explicit structure obvious to humans but only loosely defined from the machine point of view. We propose a model for recognition by selection of symbol candidates, based on evaluation of relations between candidates using a set of predicates. It is suitable for simpler structures where the relations are explicitly given by symbols, arrows in the case of diagrams. Knowledge of a specific diagram domain is used—the two domains are flowcharts and finite automata. Although the individual pipeline steps are tailored for these, the system can readily be adapted for other domains. Our entire diagram recognition pipeline is outlined. Its core parts are text/non-text separation, symbol segmentation, their classification and structural analysis. Individual parts have been published by the authors previously and so are described briefly and referenced. Thorough evaluation on benchmark databases shows the accuracy of the system reaches the state of the art and is ready for practical use. The paper brings several contributions: (a) the entire system and its state-of-the-art performance; (b) the methodology exploring document structure when it is loosely defined; (c) the thorough experimental evaluation; (d) the new annotated database for online sketched flowcharts and finite automata diagrams.
... Costagliola et al. [7] introduced visual languages with local context to check the validity of graphical symbols. In [6], [8], [9], strokes were first classified as text or non-text; then, SVM was used to cluster non-text strokes and classify uniform symbols, and finally arrows were detected. For structure analysis, Bresler et al. modeled a whole flowchart excluding texts as a maxsum problem and applied integer linear programming to solve it [22]. ...
Article
Online handwritten diagram recognition (OHDR) has attracted considerable attention for its potential applications in many areas, but it is a challenging task due to the complex 2D structure, writing style variation, and lack of annotated data. Existing OHDR methods often have limitations in modeling and learning complex contextual relationships. To overcome these challenges, we propose an OHDR method based on graph neural networks (GNNs) in this paper. In particular, we formulate symbol segmentation and symbol recognition as node clustering and node classification problems on stroke graphs and solve the problems jointly under a unified learning framework with a GNN model. This GNN model is denoted as Instance GNN since it gives the symbol instance label as well as the semantic label. Extensive experiments on two flowchart datasets and a finite automata dataset show that our method consistently outperforms previous methods with large margins and achieves state-of-the-art performance. In addition, we release a large-scale annotated online handwritten flowchart dataset, CASIA-OHFC, and provide initial experimental results as a baseline.
... On the FC dataset, the accuracy of our model is 97.36% and is 0.86% higher than that of the previous best method [3]. On the FA dataset, our model yields 99.05% accuracy, which is on par with the previous best method [4] with accuracy 99.0%. ...
Article
Full-text available
The task of grouping strokes into different categories is an essential processing step in the automatic analysis of online handwritten documents. The technical challenge originates from the variation of the handwriting style, content heterogeneity and lack of prior layout knowledge. In this work, we propose the edge graph attention network (EGAT) to address the stroke classification problem. In this framework, the stroke classification problem is formulated as a node classification problem in a relational graph, which is constructed based on the temporal and spatial relationship of strokes. Then distributed node and edge features for classification are learned by stacking of multiple edge graph attention layers, in which various attention mechanisms are exploited to aggregate information between neighborhood nodes. In the task of text/nontext classification, the proposed model achieves accuracies 98.65% and 98.90% on the IAMOnDo and Kondate datasets, respectively. In the task of multi-class classification, the achieved accuracies are 95.81%, 97.36% and 99.05% on the IAMOnDo, FC and FA datasets, respectively. In addition, we conduct ablation experiments to quantitatively and qualitatively evaluate the key modules of our model.
... To reduce the number of shape candidates produced in [Bresler et al. 2013b], [Bresler et al. 2015b] later used the Single Linkage Agglomerative Clustering (SLAC) algorithm [Delaye and Lee 2015] for grouping. In all cases, a multiclass Support Vector Machine (SVM) was used as the recogniser, although in [Bresler et al. 2015a] a recurrent neural network (LSTM) was used as a post-processing step for arrow detection. ...
Conference Paper
An early step in bottom-up diagram recognition systems is grouping ink strokes into shapes. This paper gives an overview of the key literature on automatic grouping techniques in sketch recognition. In addition, we identify the major challenges in grouping ink into identifiable shapes, discuss the common solutions to these challenges based on current research, and highlight areas for future work.
Article
This paper presents a handwriting recognition (HWR) system that deals with online character recognition in real-time. Our sensor-enhanced ballpoint pen delivers sensor data streams from triaxial acceleration, gyroscope, magnetometer and force signals at 100 Hz. As most existing datasets do not meet the requirements of online handwriting recognition and as they have been collected using specific equipment under constrained conditions, we propose a novel online handwriting dataset acquired from 119 writers consisting of 31,275 uppercase and lowercase English alphabet character recordings (52 classes) as part of the UbiComp 2020 Time Series Classification Challenge. Our novel OnHW-chars dataset allows for the evaluations of uppercase, lowercase and combined classification tasks, on both writer-dependent (WD) and writer-independent (WI) classes and we show that properly tuned machine learning pipelines as well as deep learning classifiers (such as CNNs, LSTMs, and BiLSTMs) yield accuracies up to 90 % for the WD task and 83 % for the WI task for uppercase characters. Our baseline implementations together with the rich and publicly available OnHW dataset serve as a baseline for future research in that area.
Conference Paper
Full-text available
We present our recent model of a diagram recognition engine. It extends our previous work which approaches the structural recognition as an optimization problem of choosing the best subset of symbol candidates. The main improvement is the integration of our own text separator into the pipeline to deal with text blocks occurring in diagrams. Second improvement is splitting the symbol candidates detection into two stages: uniform symbols detection and arrows detection. Text recognition is left for postprocessing when the diagram structure is already known. Training and testing of the engine was done on a freely available benchmark database of flowcharts. We correctly segmented and recognized 93.0% of the symbols having 55.1% of the diagrams recognized without any error. Considering correct stroke labeling, we achieved the precision of 95.7 %. This result is superior to the state-of-the-art method with the precision of 92.4 %. Additionally, we demonstrate the generality of the proposed method by adapting the system to finite automata domain and evaluating it on own database of such diagrams.
Conference Paper
Full-text available
This work deals with the on-line recognition of hand-drawn graphical sketches with structure. We present a novel approach, in which the search for a suitable interpretation of the input is formulated as a combinatorial optimization task - the max-sum problem. The recognition pipeline consists of two main stages. First, groups of strokes possibly representing symbols of a sketch (symbol candidates) are segmented and relations between them are detected. Second, a combination of symbol candidates best fitting the input is chosen by solving the optimization problem. We focused on flowchart recognition. Training and testing of our method was done on a freely available benchmark database. We correctly segmented and recognized 82.7% of the symbols having 31.5% of the diagrams recognized without any error. It indicates that our approach has promising potential and can compete with the state-of-the-art methods.
Conference Paper
Full-text available
A critical step of on-line handwritten diagram recognition is the segmentation between text and symbols. It is still an open problem in several approaches of the literature. However, for a human operator, text/symbol segmentation is an easy task and does not even need understanding diagram semantics. It is done thanks to the use of both structural knowledge and statistical analysis. A human operator knows what is a symbol and how to distinguish a good symbol from a bad one in a list of candidates. We propose to reproduce this perceptive mechanism by introducing some statistical information inside of a grammatical method for document structure recognition, in order to combine both structural an statistical knowledge. This approach is applied to flowchart recognition on a freely available database. The results demonstrate the interest of combining statistical and structural information for perceptive vision in diagram recognition.
Article
Full-text available
This paper deals with the qualitative and robust mod-elling of the relative positioning of on-line handwritten strokes. We exploit the fuzzy approach to take the impre-cision of such relations into account. We first transpose a well-formalized method which proved itself in the domain of image analysis to the on-line case; it aims at evaluat-ing the relation "to be in a given direction" relatively to a reference. Our first contribution is a solution to deal with the particular nature of on-line strokes, which are consti-tuted of non-connected points. Our second and main con-tribution is a method to learn automatically fuzzy relative position relationships. It aims at evaluating the relation "to be in a given position" relatively to a reference using jointly the direction and the distance. We test the impact of this new fuzzy positioning approach on one possible appli-cation: the recognition of handwritten graphic gestures, which requires spatial context information to be discrim-inated. Whereas the recognition rate is 52.95% without any spatial information, we obtain a maximum of 95.75% when we use learnt relative position relationships.
Conference Paper
Full-text available
We present a method for the recognition of on-line hand-written commutative diagrams. Diagrams are formed with arrows that join relatively simple mathematical expressions. Diagram recognition consists in grouping isolated symbols into simple expressions, recognizing the arrows that join such expressions, and finding the layout that best describes the diagram. We model the layout of the diagram with a grid that optimally fits a tabular arrangement of expressions. Our method maximizes a linear function that mea-sures the quality of our layout model. The recognition results are translated into the LaTeX library xy-pic.
Conference Paper
Full-text available
Relative positioning between components of a structured object plays a key role for its interpretation. Fuzzy relative positioning templates are a description framework for 2D handwritten patterns, that is based on positioning models specifically designed for dealing with variability and imprecision of handwriting. In this work, we present fuzzy positioning templates and investigate the idea of recognizing structured handwritten symbols by considering the relative positioning of the components, rather than the shapes of the components themselves or the global shape of the symbol. The templates are automatically trained from data without requiring any prior knowledge. Experiments on a database of on-line symbols prove that this original strategy is a promising approach for interpretation of structured patterns.
Conference Paper
Free-sketch recognition systems attempt to recognize freely-drawn sketches without placing stylistic constraints on the users. Such systems often recognize shapes by using geometric primitives that describe the shape's appearance rather than how it was drawn. A free-sketch recognition system necessarily allows users to draw several primitives using a single stroke. Corner finding, or vertex detection, is used to segment these strokes into their underlying primitives (lines and arcs), which in turn can be passed to the geometric recognizers. In this paper, we present a new multi-pass corner finding algorithm called MergeCF that is based on continually merging smaller stroke segments with similar, larger stroke segments in order to eliminate false positive corners. We compare MergeCF to two benchmark corner finders with substantial improvements in corner detection.
Conference Paper
In this paper we propose a novel approach for online mode detection, where the task is to classify ink traces into several categories. In contrast to previous approaches working on global features, we introduce a system completely relying on local features. For classification, standard recurrent neural networks (RNNs) and the recently introduced long short-term memory (LSTM) networks are used. Experiments are performed on the publicly available IAMonDo-database which serves as a benchmark data set for several researches. In the experiments we investigate several RNN structures and classification sub-tasks of different complexities. The final recognition rate on the complete test set is 98.47% in average, which is significantly higher than the 97% achieved with an MCS in previous work. Further interesting results on different subsets are also reported in this paper.
Conference Paper
Sketch recognition is defined as the process of identifying symbols that users draw using single or multiple strokes. Users draw strokes using a pen and the system immediately interprets their strokes as objects that can be easily manipulated. This paper uses Particle Swarm Optimization Algorithm (PSO) to divide the strokes the user draws into meaningful geometric primitives. These geometric primitives are grouped to formulate symbols which are further identified. The results show that using PSO improves segmentation results which guide the symbol recognition phase. This paper uses Support Vector Machines (SVM) classifier which further improves the final recognition accuracy.