ArticlePDF Available

Detection of Arrows in On-Line Sketched Diagrams Using Relative Stroke Positioning

February 2015

February 2015

DOI:10.1109/WACV.2015.87

Authors:

Martin Bresler

Czech Technical University in Prague

Daniel Průša

Czech Technical University in Prague

Vaclav Hlavac

Czech Technical University in Prague

This paper deals with recognition of arrows in online sketched diagrams. Arrows have varying appearance and thus it is a difficult task to recognize them directly. It is beneficial to detect arrows after other symbols (easier to detect) are already found. We proposed [4] an arrow detector which searches for arrows as arbitrarily shaped connectors between already found symbols. The detection is done two steps: a) a search for a shaft of the arrow, b) a search for its head. The first step is relatively easy. However, it might be quite difficult to find the head reliably. This paper brings two contributions. The first contribution is a design of an arrow recognizer where the head is detected using relative strokes positioning. We embedded this recognizer into the diagram recognition pipeline proposed earlier [4] and increased the overall accuracy. The second contribution is an introduction of a new approach to evaluate the relative position of two given strokes with neural networks (LSTM). This approach is an alternative to the fuzzy relative positioning proposed by Bout ruche et al. [2]. We made a comparison between the two methods through experiments performed on two datasets for two different tasks. First, we used a benchmark database of hand-drawn finite automata to evaluate detection of arrows. Second, we used a database presented in the paper by Bout ruche et al. containing pairs of reference and argument strokes, where argument strokes are classified into 18 classes. Our method gave significantly better results for the first task and comparable results for the second task.

Examples of hand-drawn diagrams containing arrows connecting symbols with rigid bodies: (a) finite automata, (b) flowchart.

…

. Diagram recognition results for the FA domain.

…

Classes of the argument strokes in ACCENT database.

…

Figures - uploaded by Martin Bresler

Content may be subject to copyright.

Content uploaded by Martin Bresler

Content may be subject to copyright.

Detection of Arrows in On-line Sketched Diagrams

using Relative Stroke Positioning

Martin Bresler, Daniel Pr˚

uˇ

sa, V´

aclav Hlav´

aˇ

Czech Technical University in Prague, Faculty of Electrical Engineering,

Department of Cybernetics, 166 27, Praha 6, Technick´

a 2, Czech Republic

{breslmar, prusapa1, hlavac}@cmp.felk.cvut.cz

Abstract

This paper deals with recognition of arrows in online

sketched diagrams. Arrows have varying appearance and

thus it is a difﬁcult task to recognize them directly. It is ben-

eﬁcial to detect arrows after other symbols (easier to de-

tect) are already found. We proposed [4] an arrow detector

which searches for arrows as arbitrarily shaped connectors

between already found symbols. The detection is done two

steps: a) a search for a shaft of the arrow, b) a search for

its head. The ﬁrst step is relatively easy. However, it might

be quite difﬁcult to ﬁnd the head reliably. This paper brings

two contributions. The ﬁrst contribution is a design of an

arrow recognizer where the head is detected using relative

strokes positioning. We embedded this recognizer into the

diagram recognition pipeline proposed earlier [4] and in-

creased the overall accuracy. The second contribution is an

introduction of a new approach to evaluate the relative posi-

tion of two given strokes with neural networks (LSTM). This

approach is an alternative to the fuzzy relative positioning

proposed by Bouteruche et al. [2]. We made a comparison

between the two methods through experiments performed on

two datasets for two different tasks. First, we used a bench-

mark database of hand-drawn ﬁnite automata to evaluate

detection of arrows. Second, we used a database presented

in the paper by Bouteruche et al. containing pairs of ref-

erence and argument strokes, where argument strokes are

classiﬁed into 18 classes. Our method gave signiﬁcantly

better results for the ﬁrst task and comparable results for

the second task.

1. Introduction

This paper deals with on-line handwriting recognition,

where the input consists of a sequence of strokes. A stroke

is a sequence of points captured by an ink-input device (the

most commonly a tablet or a tablet PC) as the user was writ-

ing with a stylus or his ﬁnger. In handwriting recognition,

the research has already moved from recognition of plain

text to recognition of a more structured input as diagrams.

This work is focused on recognition of arrows in on-line

sketched diagrams.

Arrows are the most important symbols in diagrams,

since they bear the most valuable information about the

diagram structure – what symbols are connected together.

However, it is a difﬁcult task to recognize them because

of their varying appearance. We consider two diagram do-

mains – ﬁnite automata (FA) and ﬂowcharts (FC). There is

a freely available benchmark database available for each of

the domains: the FA database [4] and the FC database [1].

Figure 1 shows examples of diagrams from these two do-

mains. It is obvious that arrows can be arbitrarily directed

and their shafts might be straight lines, curved lines, or

polylines. Moreover, their heads have a different shape.

There exists an approach, where arrows are detected ﬁrst

and the knowledge of arrows helps to naturally segment the

rest of the symbols [14]. The problem is that authors of this

approach put very strict requirements on the way the arrow

is drawn. It must consist of one or two strokes and the ar-

row’s head must have only one predeﬁned shape. Another

approach is to detect arrows the same way as other sym-

bols – using a classiﬁer based on the symbol appearance.

Since the arrows might be arbitrarily rotated and the heads

might have different shapes, it is necessary to create several

arrow sub-classes. This approach is more general, but the

achieved accuracy is limited. The state-of-the-art methods

in ﬂowchart recognition achieve always very small accu-

racy in arrow recognition [5, 3]. We already suggested [4]

that it is better to detect arrows after the other symbols

are detected. We proposed an algorithm, which searches

for arrows as arbitrarily shaped connectors between already

found non-arrow symbols. It works in two stages: a) ar-

row shaft detection, b) arrow head detection. The detection

of arrow head is based on heuristics and does not achieve

satisfactory precision. In this paper, we employ machine

learning to improve the proposed arrow detector with arrow

head classiﬁer based on relative strokes positioning.

(a) (b)

Figure 1. Examples of hand-drawn diagrams containing arrows connecting symbols with rigid bodies: (a) ﬁnite automata, (b) ﬂowchart.

In many cases, appearance does not give us enough in-

formation to classify single strokes and we need some con-

textual information. Relative position of a stroke with re-

spect to a reference stroke is the most intuitive. Bouteruche

et al. [2] addressed this problem directly and proposed a

fuzzy relative positioning method. The authors introduced

a method evaluating the relative position of strokes based

on the fact how pairs of strokes fulﬁl a set of relations such

as ”the second stroke is on the right of the ﬁrst stroke”

through deﬁned fuzzy landscapes. They used this method

to solve a prepared task, where pairs of reference and ar-

gument strokes are given and the argument strokes have to

be classiﬁed into 18 classes corresponding to several types

of accentuation or punctuation. The information about the

appearance and the relative position of the argument stroke

with respect to the reference stroke must be combined to-

gether to achieve a good recognition rate. This task ade-

quately demonstrates the need for relative positioning sys-

tem. They used Radial Basis Function Networks (RBFN)

as a classiﬁer. The method was further improved by a better

deﬁnition of fuzzy landscapes and using SVM by Delaye et

al. [7]. Although the fuzzy relative positioning is a power-

ful method useful for more complex tasks as recognition of

structured handwritten symbols (Chinese characters) [6], it

gives poor results when applied on arrow head detection.

Our work brings two contributions. First, we deﬁne ar-

row head detection as a classiﬁcation of possible arrow head

strokes based on relative positioning. We used this arrow

head classiﬁer to signiﬁcantly improve proposed arrow de-

tector. Second, we propose a new method for evaluation

of the relative position of strokes, which exploits simple

low-level features and uses Bidirectional Long Short Term

Memory (BLSTM) Recurrent Neural Network (RNN) as a

classiﬁer. The BLSTM RNN proved to be a good tool for

classiﬁcation of individual strokes [13].

The rest of the paper is organized as follows. Section 2

describes the proposed arrow detector and the way the rel-

ative positioning is exploited to determine which strokes

represent the head of the arrow. Section 3 introduces our

method for evaluation of the relative position. Experiments

and their results are described in Section 4. Finally, we

make a conclusion in Section 5.

2. Arrow detector

Arrows are symbols with a non-rigid body. They consist

of two parts: shaft and head. The head deﬁnes the orien-

tation of the arrow. However, arrow’s appearance can be

changing arbitrarily according to the given domain. They

can have various shapes, lengths, heads, and directions.

Therefore, it is a difﬁcult task to detect arrows with or-

dinary classiﬁers based on symbol appearance. However,

each arrow connects two other symbols with a rigid body

(see Figure 1). It is beneﬁcial to detect these symbols ﬁrst

and leave the arrow detection to another classiﬁer detecting

arrows between pairs of these symbols. This new classiﬁer

must perform the following two steps:

1. Find a shaft of the arrow connecting the given two

symbols. This shaft is just a sequence of strokes lead-

ing from a vicinity of the ﬁrst symbol to a vicinity of

the second symbol and it is undirected.

2. Find a head of the arrow, which is located around one

of the end-points of the shaft. The head deﬁnes orien-

tation of the arrow (if it is heading from the ﬁrst sym-

bol to the second symbol or vice versa).

The detection of an arrow’s shaft can be done iteratively

by simply adding strokes to a sequence such that the ﬁrst

stroke starts in a vicinity of the ﬁrst symbol and the last

stroke ends in a vicinity of the second symbol. A new stroke

is added to the sequence only if the distance between the

end-point of the last stroke and the end-point of the new

stroke is smaller than a threshold. The algorithm must con-

sider all possible combinations of strokes creating a valid

connection between the given two symbols. The search

Detection of arrow

shaft

Extraction of

reference strokes

and points

Query strokes

search and

classification

Query strokes

search and

classification

Selection of the best

arrow head

Ref.point A

Ref. point B

Head A

Head B

Arrow

Shaft

Pairs of symbols

Figure 2. Arrow recognition pipeline. The recognition process is illustrated on a simple example of two symbols from FC domain.

space can be reasonably reduced by setting a maximal num-

ber of strokes in the sequence. This number depends on the

domain and the fact, how many strokes users use to draw ar-

row’s shafts. Typically, it is four and two for ﬂowcharts and

ﬁnite automata, respectively. We can immediately remove

some shafts, which are in a conﬂict with another shafts,

and keep those with the smallest sum of the following dis-

tances: a) distance between the ﬁrst symbol and the ﬁrst

stroke of the shaft, b) distance between the second symbol

and the last stroke of the shaft, c) distances between indi-

vidual strokes of the shaft.

Since we do not know the orientation of the arrow yet

and the shaft is undirected, we have to consider both end-

points of the shaft and try to ﬁnd two heads (one in the

vicinity of each end-point). Ideally we will be able to ﬁnd

just one head. In practice, it can happen that we ﬁnd two

heads and we have to decide which one is better. The de-

tection of an arrow’s head is not a trivial task, because there

might be a lot of interfering strokes around the end-points

of the shaft: heads of another arrows or text. The deci-

sion which strokes represent the true arrow’s head we are

looking for and which are not, is a task, where the stroke

positioning might be beneﬁcially used. First, we deﬁne a

reference stroke (a sub-stroke of the shaft) and a reference

point (end-point of the shaft), which are used to express

a relative position of query strokes (details follow in Sec-

tion 2.1). Second, this information about relative position is

given to a classiﬁer making the decision. The query strokes

are all strokes in a vicinity of a given end-point of the shaft,

which are not a part of the shaft itself nor the two given

symbols. We make a classiﬁcation into two classes: head

and not-head. Explanation for the evaluation of the relative

position of strokes and classiﬁcation is given in Section 3.

Let us just note that the classiﬁer returns a class into which

the query stroke is classiﬁed along with a potential. We use

this potential to decide which head is of better quality in the

case we ﬁnd two. We just compute a sum of potentials of all

strokes in each head and decide for the head with the big-

ger value. This slightly favours heads consisting of higher

number of strokes, which is desirable in the most cases. A

pseudocode for the algorithm that we just described is di-

vided into two procedures and presented in the supplemen-

tary material as Algorithm 1 and Algorithm 2. The arrow

recognition pipeline is depicted in Figure 2.

It happens quite often that the user draws a shaft and a

head of an arrow by one stroke. Our algorithm would fail

in that case. Therefore, we make one important step before

we try to ﬁnd the arrow’s head – we segment the last stroke

of the shaft into smaller sub-strokes in such a way that the

head is split from the shaft. Created sub-strokes are divided

into two groups. One group is used to ﬁnish the shaft again

such that it reaches the symbol again. Sub-strokes of the

second group are put into the set of query strokes possibly

forming the head. Our splitting algorithm is described in

Section 2.2. If the shaft and the head are not drawn by one

stroke, the algorithm will ideally perform no segmentation

and this step can be skipped.

2.1. Reference stroke and reference point

It is necessary to deﬁne a reference stroke. Position of all

query strokes will be evaluated relatively with respect to it.

Naturally, it seems that the arrow’s shaft should be the refer-

ence stroke. However, it is better to use just a sub-stroke of

the shaft for this purpose. The reason is that the shaft might

be arbitrarily curved or refracted, the whole arrow might

be arbitrarily rotated, and we want to normalize the input

in such a way that the reference stroke has always more or

less the same appearance and the query strokes have always

more or less the same relative position. Therefore, we cre-

ate a sub-stroke beginning at the end-point of the shaft with

a shape of a line segment. It is done iteratively by adding

points to the newly created stroke until the value of a crite-

rion, expressing how similar is the stroke to a line, is bigger

than a threshold. The criterion is a ratio of the distance be-

tween the end-points of the stroke and the path length of the

stroke (sum of distances between neighbouring points). We

set the threshold empirically to 0.95. Another condition is

that the distance between end-points of the stroke must be

bigger then a threshold empirically derived from the aver-

age length of strokes, because the possible presence of so

called hooks at ends of strokes would cause small value of

the criterion for short strokes. Figure 3 illustrates how the

reference stroke is determined as a sub-stroke of the shaft.

Then we rotate the reference stroke and all query strokes

by such an angle that the vector given by the end-points of

the reference stroke will be pointing in the direction of the

x-axis. In another words, it will cause that the true arrow

heads should point from the left to the right. For purposes

of our method for evaluation of relative position of strokes

(described in Section 3), we have to deﬁne a reference point.

Obviously, it is the end-point of the shaft.

(a)

(b) (c)

(d) (e)

Figure 3. Example showing a diagram and the way of choosing

the reference point, the reference stroke, and the rotation. Indi-

vidual pictures illustrates the following: (a) whole diagram with a

highlighted (red) arrow to be detected, (b) detected arrow’s shaft

is blue and right end-point is considered to be the reference point,

second point is green, the angle αused to rotate query strokes is

marked, (c) rotation is done, the reference point is red as well as

strokes of the real arrow’s head, (d) analogously to (b) with the

other end-point considered, (e) analogously to (c) with exception

that there is no real head, because the arrow’s orientation is wrong.

Because we still do not know the orientation of the arrow,

we have to consider both options: the arrow is heading to

the ﬁrst symbol or the second symbol. Therefore, we deﬁne

two reference points, end-points of the shaft. A reference

(sub)stroke is associated to each of these two points then.

Figure 3 shows the whole process of the reference stroke

extraction and rotation.

2.2. Stroke segmentation

Stroke segmentation is very important ﬁeld of research,

because it is frequently used preprocessing step. Therefore,

there exist various papers dealing with this problem. The

segmentation is done by deﬁning a set of splitting points.

The substantial information is curvature and speed deﬁned

at each point and geometric properties of stroke segments.

The common approach is to ﬁnd tentative splitting points

with high curvature and low speed. The best subset of

these points is selected according to the error function ﬁt-

ting points of each segment into selected primitives. The

most common primitives are line segments and arcs [8, 15].

It is also possible to use machine learning to train a classiﬁer

detecting the splitting points [9, 11].

The presented algorithms are sophisticated and allow to

ﬁnd segments ﬁtting predeﬁned primitives. However, us-

ing any of these methods seems to be an overkill for our

task. We do not require to split a stroke at any precisely de-

ﬁned point nor to create segments with particular geomet-

rical properties (line segments or arcs). All we need is to

split the arrow’s head from its body and it is not important

if both the body and the head will be further split into sev-

eral segments. Therefore, we suggest to use much simpler

algorithm for stroke segmentation. Its description follows.

We compute a value AA, which we call “accumu-

lated angle”, associated to each point of the stroke S=

{p1, p2, . . . , pn}according to the following equation:

AAi= mean(Rank3{A(i, 1),...,A(i, min(i−1, n−i, R))}),

(1)

where iis the index of the point in the sequence, Rank3

is an operator choosing up to the three smallest values of

a given set, Ris the maximal radius, and Ais a function

computing an angle between two vectors deﬁned by the in-

dex of the given reference point and its two neighbouring

points chosen by the size of the radius. The function Ais

deﬁned as follows:

A(i, r) = arccos

−−−−→

pipi−r·−−−−→

pipi+r

kpipi−rk·kpipi+rk.(2)

Let us note that AAiis computed according to Equation (1)

only for i∈ {2, . . . , n −1}and AA1=AAn= 0. We

deﬁne the initial set of splitting points by taking points

where the AA reached a local minima and the value is

smaller than mCoeﬀ ·mean{AA1, . . . , AAn}. In the case

that there are two splitting points too close to each other

(dist(pi, pj)<distThresh), we remove one with smaller

AA value. We set mCoeﬀ = 0.5and distThresh = 200

empirically. After this removal, the segmentation is done.

We tested the described algorithm on arrows from the FA

database (see Section 4.1) which were drawn by one stroke

and it turned out that the algorithm split the head from the

body in 100% of cases. Let us emphasize that parameters

mCoeﬀ and distThresh are tunable. It makes it easy to

adjust for demands of a given task.

3. Evaluation of relative position of strokes

Unlike the method by Bouteruche et. al, where a query

stroke is evaluated with respect to the whole reference

α1

α2

αn

d1d2

(a) Arrow domain.

α1

αn

(b) Accent domain.

Figure 4. Example showing pairs of reference and query strokes and extracted sequences of features (angles and distances) for both

domains. Reference point R is marked red. In the case of arrow domain, both, the reference and the query, strokes are already rotated. The

query stroke is a sequence of points {p1, p2,...,pn}.

stroke by evaluating its fuzzy structuring element, we pro-

pose to evaluate the relative position of a query stroke with

respect just to a single point of the reference stroke. In the

case of arrows, it is the end-point of the arrow’s shaft. In

the case of the task deﬁned by Bouteruche et al., it can be

an arbitrary ﬁx point. We propose to choose a center of the

reference stroke’s bounding box.

We are given a reference stroke, which is represented by

its reference point Rand a query stroke Sdeﬁned by a se-

quence of its points: S={p1, p2, . . . , pn}. To describe the

relative position of Swith respect to R, we express relative

position of each point piusing polar coordinates. Position

of each point is deﬁned by the angle αi=−−→

Rpi∠−→

xand the

distance di=kRpik. We create a sample for each pair of

a reference and query strokes consisting of a sequence of

the described features {[α1, d1],[α2, d2],...,[αn, dn]}and

a label indicating the class of the query stroke. For illus-

tration, see Figure 4. We propose to use (B)LSTM RNN as

a classiﬁer, because it reaches the best results in many ap-

plications. However, it is possible to use different tools for

classifying sequences (e.g. Hidden Markov Models). When

dealing with neural networks, it make sense to normalize

inputs:

ˆvk=vk−mk

σk

,(3)

where vkis an input value, ˆvkis the normalized value, mk

and σkare the mean and the standard deviation of all values

of the same feature from the training database, respectively.

We use this normalization to normalize the distance only.

The advantage of proposed features is the fact that they

are simple and easy to extract (low time complexity). More-

over, they express relative position of the query stroke with

respect to the reference point as well as the shape of the

query stroke. It is possible to reconstruct the trajectory of

the query stroke from the sequence of features. It leads to

simple implementation and fast evaluation.

4. Experiments

We made experiments on two tasks. The ﬁrst one is the

task deﬁned in this paper – classiﬁcation of strokes into two

classes head,not-head when the reference stroke is a part

of the arrow’s shaft. The second task is to classify argument

strokes representing accentuation or punctuation of its refer-

ence strokes into 18 classes. The task as well as the database

called ACCENT was proposed by Bouteruche et al. In the

case of arrows we additionally evaluated the whole process

of arrows detection, where the stroke classiﬁcation is a sub-

task. We used both positioning methods to solve both tasks

and we made a comparison. All experiments were done on

a standard tablet PC Lenovo X230 (Intel Core i5 2.6 GHz,

8GB RAM) with 64-bit Windows 7 operating system.

4.1. Arrows

We used the FA database for this experiment. The ver-

sion 1.1 contains the annotation of heads and shafts of ar-

rows. We extracted a reference point and stroke for each

arrow as described in Section 2.1. The only difference is

that the shaft is known from the annotation. We created

a set of query strokes and rotated these strokes according

to the reference stroke. We extracted features with respect

to the reference point or the reference stroke depending on

the used method for each query stroke and assigned a label

based on the annotation from the database. We refer to the

samples with the label head as positive and those with the

label not-head as negative samples. The FA database con-

sists of 12 diagram patterns drawn by several users and it is

split into training and test dataset. The training dataset con-

tains diagrams from 11 users (132 diagrams) and the test

dataset diagrams from 7 users (84 diagrams). Each dia-

gram is formed of 54 strokes and contains 5 symbols and

10 arrows in average. We extracted 1480/834 positive and

1263/1019 negative samples from the training/test dataset.

Arrows drawn by one stroke are manually segmented in the

database. However, to demonstrate our segmentation algo-

rithm from Section 2.2, we created a second test dataset (ref.

as test2), where we further segmented query strokes. Ob-

tained sub-strokes created new samples with the same label

as the original ones. We used this dataset to show that possi-

ble oversegmentation will not lower the ﬁnal precision. We

created 1252 positive and 1876 negative examples this way.

For our method, we used LSTM and BLSTM RNNs im-

plemented within the library JANNLab [12]. We tried dif-

ferent numbers of nodes in the hidden layer to get the best

performance. We always trained the network in 200 epochs

with the following parameters: learning rate 0.001, momen-

tum 0.9. We achieved the best overall precision of 99.9%

with the BLSTM RNN with 32 nodes in the hidden layer.

However, it might be important to ﬁnd a trade-off between

precision and time complexity and thus it might be better

to use the LSTM RNN with only 8 nodes in the hidden

layer, because it is signiﬁcantly faster. It gives the precision

of 99.6 % and the average time needed for classiﬁcation is

0.79 ms. For details, refer to Figure 5. The best achieved

precision for individual classes are given in Table 1. The

achieved precision on the test2 with the best trained neural

network was not decreased and reached 99.9 %.

For the method of Bouteruche et al., we used a RBFN

implemented within the library Encog [10]. We set the

number of the nodes in the hidden layer to be a power of

the number of features, which leads to equally spaced RBF

centers. It is the setting qiving the best performance. We

tried two sets of features proposed by Bouteruche et al. re-

ferred in their paper by numbers 4 and 5 and we achieved

the accuracy of 95.4 % and 88.2 %, respectively. It is not

surprising that the feature set number 5 reached much worse

results. It contains features expressing how much a query

stroke ﬁts into structuring elements of all classes. However,

in this case, we have just two classes and the class of nega-

tive samples contains arbitrarily shaped strokes and thus the

structuring elements are too wide. We also implemented the

method by Delaye et al. [7]. Their ﬁltered fuzzy landscape

is an improvement of the Bouteruche’s feature set 5 and thus

it gives rather low precision for the very same reason. The

feature set number 4 gives much better results. However, it

was still inferior in comparison with our method – the best

overall precision of 95.36 %. For more detailed results see

again Table 1.

Since we use RNN in our method, the classiﬁcation has

higher time complexity (especially with increasing com-

plexity of the net). The classiﬁcation made by a RBFN is

indeed very fast. On the other hand, it is much faster to

extract the low level features we use: 0.016 ms per sample.

Feature extraction is slower in the case of fuzzy positioning:

2.89 ms per sample for the feature set number 4 and 0.99 ms

per sample for the feature set number 5.

2 4 8 16 32

96.5

97.5

98.5

99.5

100

number of nodes in the hidden layer [−]

precision [%]

Precision of RNNs

LSTM

BLSTM

2 4 8 16 32

number of nodes in the hidden layer [−]

time [ms]

Time needed to classify one sample

LSTM

BLSTM

Figure 5. Dependency of precision and time complexity on the

number of nodes in the hidden layer of RNNs for the FA database.

Method positive negative overall

Ours 99.91 % 99.85 % 99.88 %

Bouteruche et al. (4) 98.56 % 92.75 % 95.36 %

Bouteruche et al. (5) 94.24 % 83.32 % 88.24 %

Delaye et al. 95.17 % 86.07 % 90.17 %

Table 1. Comparison of precisions for arrow heads detection.

4.1.1 Arrow detector test

We took all annotated symbols with rigid bodies and

tried to ﬁnd arrows with the arrow detector we proposed

(query strokes for arrow heads were classiﬁed with our best

BLSTM RNN). We compared the detected arrows with an-

notated arrows. Let us remind that all pairs of symbols were

considered. Conﬂicting arrow shafts were removed imme-

diately. However, adding arrow heads may cause another

conﬂicts. The result of the arrow detector is a list of ar-

row candidates and a structural analysis should be done to

solve the conﬂicts. However, we tried to remove conﬂicts

by simply keeping arrows with higher conﬁdence to see how

it affects recall and precision. The test dataset of the FA

database contains 796 arrows. We achieved the recall of

95.4 % / 94.2 % and the precision of 41.5 %/95.4 % for un-

performed /performed conﬂict removal. Our arrow detector

performs 106.5 stroke classiﬁcations in average per diagram

while searching for arrow heads while there are 10 arrows

in average per diagram.

4.1.2 Diagram recognition pipeline test

We embedded our arrow detector into the diagram recogni-

tion pipeline proposed earlier [4] and made experiments on

the FA and FC databases. The FC database does not contain

annotation of arrow heads and shafts. Therefore, we used

the arrow head classiﬁer trained on the FA database in both

cases. The results are shown in Tables 2, 3. Although there

is an improvement in both domains, it is more signiﬁcant

in the FA domain. The recognition accuracy increased in

all symbol classes, which shows that misrecognized arrows

can cause further errors in classiﬁcation of other symbols.

Class

Correct stroke Correct symbol segmentation

labeling [%] and recognition [%]

Previous Proposed Previous Proposed

Arrow 89.3 94.9 84.4 92.8

Arrow in 78.5 85.0 80.0 84.0

Final state 96.1 99.2 93.8 98.4

State 95.2 96.9 94.5 97.2

Label 99.1 99.8 96.0 99.1

Total 94.5 97.4 91.5 96.4

Table 2. Diagram recognition results for the FA domain.

Class

Correct stroke Correct symbol segmentation

labeling [%] and recognition [%]

Previous Proposed Previous Proposed

Arrow 85.3 88.7 74.4 78.1

Connection 93.3 94.1 93.6 95.1

Data 95.6 96.4 88.8 90.6

Decision 90.8 90.9 74.1 75.3

Process 93.7 95.2 87.2 88.1

Terminator 89.7 90.2 88.1 88.9

Text 99.0 99.3 87.9 89.7

Total 95.2 96.5 82.8 84.43

Table 3. Diagram recognition results for the FC domain.

4.2. Accent

The Accent database consists of pairs of reference and

argument strokes. The task is to classify the argument

strokes into 18 graphic gestures. Two of them correspond

to the addition of a stroke to a character. The 16 others (see

Figure 6) correspond to an accentuation of their reference

character (acute, grave, cedilla, etc.), to a punctuation sym-

bol (coma, dot, apostrophe, etc.) or to an editing gesture

(space, caret return, etc.). As several subsets of gestures

have the same shape, the only way to discriminate them is

to use spatial context – their relative position. The exam-

ples of the benchmark have been written on a PDA by 14

writers. The training database contains 4243 examples of 8

writers and the test database contains 2393 examples of 6

writers. None of the writers is common to both data sets.

Figure 6. Classes of the argument strokes in ACCENT database.

To apply our method, we set a center of each reference

stroke’s bounding box as a reference point and extracted

features. We tried LSTM and BLSTM RNNs the same

way as in the case of the Arrow database. However, we

achieved the precision of 91.9 % only. It turned out that

our features have a problem to distinguish very small ar-

gument strokes like acute, apostrophe, or dieresis. These

strokes often consist just of one single point. Therefore,

we decided to enrich the set of features and add local fea-

tures describing the appearance of strokes. We used four

features introduced by Otte et al. [13]: an index of the point

to distinguish long and short strokes, sine and cosine of the

angle between the current and the last line segment (zero

for extreme points), and sum of lengths of the current and

the previous line segments. Let us note that the point in-

dices and distances are normalized (3). We refer to the two

sets of features and associated experiments as basic and ex-

tended. We achieved the best precision with the extended

features and the BLSTM RNN with 32 nodes in the hidden

layer, which was 93.6 %. The training was done again with

the learning rate of 0.001 and the momentum of 0.9. The

ROC curves and time complexities are shown in Figure 7.

In the case of the method of Bouteruche et al., we used

our reimplementation and made the experiments. We con-

ﬁrm the results they stated – the precision of 95.75 %.

5. Conclusions

We have shown how important and difﬁcult task is the

arrow recognition for the whole process of diagram recog-

nition. We designed an arrow recognizer, which detects ar-

rows in two steps: a) detection of an arrow’s shaft, b) detec-

tion of an arrow’s head. First step is easy, because the search

for a shaft is guided by detected symbols connected by the

arrow. For the second step, we proposed a novel arrow head

2 4 8 16 32 64

100

number of nodes in the hidden layer [−]

precision [%]

Precision of RNNs

Basic LSTM

Basic BLSTM

Extended LSTM

Extended BLSTM

2 4 8 16 32 64

0.5

1.5

2.5

3.5

number of nodes in the hidden layer [−]

time [ms]

Time needed to classify one sample

Basic LSTM

Basic BLSTM

Extended LSTM

Extended BLSTM

Figure 7. Dependency of precision and running time on the num-

ber of nodes in the hidden layer of RNNs for ACCENT database.

classiﬁer based on relative stroke positioning. We presented

a classiﬁcation method based on low-level features using

(B)LSTM RNNs. We embedded the proposed arrow detec-

tor into diagram recognition pipeline and we increased the

accuracy of the state-of-the-art diagram recognizer on the

benchmark databases of ﬁnite automata and ﬂowcharts.

We have also made the comparison with the state-of-the-

art method for relative positioning method. This method is

unable to solve the proposed task adequately and reaches

the inferior precision. However, we have made the compar-

ison on the task for which this method was developed and

it shows that our method gives slightly worse results in that

case. It implies that the fuzzy positioning might be a good

solution for some sort of tasks (data), but it is not a gen-

eral tool. On the other hand, our method seems to be more

general since it gave relatively good results in both cases.

Even in the case it gives slightly worse results it might be

a good alternative thanks to its simplicity and fast feature

extraction.

Acknowledgment

The ﬁrst author was supported by the Grant Agency of

the CTU under the project SGS13/205/OHK3/3T/13. The

second and the third authors were supported by the Grant

Agency of the Czech Republic under Project P103/10/0783

and the Technology Agency of the Czech Republic under

Project TE01020197 Center Applied Cybernetics, respec-

tively.

References

[1] A.-M. Awal, G. Feng, H. Mouchere, and C. Viard-Gaudin.

First experiments on a new online handwritten ﬂowchart

database. In DRR 2011, pages 1–10, 2011.

[2] F. Bouteruche, S. Mac´

e, and E. Anquetil. Fuzzy relative po-

sitioning for on-line handwritten stroke analysis. In Proceed-

ings of IWFHR 2006, pages 391–396, 2006.

[3] M. Bresler, D. Pr˚

uˇ

sa, and V. Hlav´

aˇ

c. Modeling ﬂowchart

structure recognition as a max-sum problem. In Proceedings

of ICDAR 2013, pages 1247–1251, August 2013.

[4] M. Bresler, T. V. Phan, D. Pr˚

uˇ

sa, M. Nakagawa, and

V. Hlav´

aˇ

c. Recognition system for on-line sketched dia-

grams. In Proceedings of ICFHR 2014, pages 563–568,

September 2014.

[5] C. Carton, A. Lemaitre, and B. Couasnon. Fusion of statis-

tical and structural information for ﬂowchart recognition. In

Proceedings of ICDAR 2013, pages 1210–1214, 2013.

[6] A. Delaye and E. Anquetil. Fuzzy relative positioning tem-

plates for symbol recognition. In Proceedings of ICDAR

2011, pages 1220–1224, September 2011.

[7] A. Delaye, S. Mac´

e, and E. Anquetil. Modeling Relative

Positioning of Handwritten Patterns. In Proceedings of IGS

2009, pages 122–127, 2009.

[8] M. El Meseery, M. El Din, S. Mashali, M. Fayek, and N. Dar-

wish. Sketch recognition using particle swarm algorithms. In

Proceedings of ICIP 2009, pages 2017 – 2020, 2009.

[9] G. Feng and C. Viard-Gaudin. Stroke fragmentation based

on geometry features and HMM. CoRR, 2008.

[10] Heaton Research, Inc. Encog Machine Learning Framework,

2013. http://www.heatonresearch.com/encog.

[11] J. Herold and T. F. Stahovich. Classyseg: A machine learning

approach to automatic stroke segmentation. In Proceedings

of SBIM 2011, pages 109–116, 2011.

[12] S. Otte, D. Krechel, and M. Liwicki. JANNLab Neural Net-

work Framework for Java. In Proceedings of MLDM 2013,

pages 39–46, 2013.

[13] S. Otte, D. Krechel, M. Liwicki, and A. Dengel. Local

feature based online mode detection with recurrent neural

networks. In Proceedings of ICFHR 2012, pages 531–535,

2012.

[14] A. Stoffel, E. Tapia, and R. Rojas. Recognition of on-line

handwritten commutative diagrams. In Proceedings of IC-

DAR 2009, pages 1211–1215, 2009.

[15] A. Wolin, B. Paulson, and T. Hammond. Sort, merge, re-

peat: An algorithm for effectively ﬁnding corners in hand-

sketched strokes. In Proceedings of SBIM 2009, pages 93–

99, 2009.

Arrow R-CNN for handwritten diagram recognition

Article

Full-text available

Jun 2021
INT J DOC ANAL RECOG

We address the problem of offline handwritten diagram recognition. Recently, it has been shown that diagram symbols can be directly recognized with deep learning object detectors. However, object detectors are not able to recognize the diagram structure. We propose Arrow R-CNN, the first deep learning system for joint symbol and structure recognition in handwritten diagrams. Arrow R-CNN extends the Faster R-CNN object detector with an arrow head and tail keypoint predictor and a diagram-aware postprocessing method. We propose a network architecture and data augmentation methods targeted at small diagram datasets. Our diagram-aware postprocessing method addresses the insufficiencies of standard Faster R-CNN postprocessing. It reconstructs a diagram from a set of symbol detections and arrow keypoints. Arrow R-CNN improves state-of-the-art substantially: on a scanned flowchart dataset, we increase the rate of recognized diagrams from 37.7 to 78.6%.

Online Handwritten Diagram Recognition with Graph Attention Networks

Chapter

Nov 2019

Handwritten text recognition has been extensively researched over decades and achieved extraordinary success in recent years. However, handwritten diagram recognition is still a challenging task because of the complex 2D structure and writing style variation. This paper presents a general framework for online handwritten diagram recognition based on graph attention networks (GAT). We model each diagram as a graph in which nodes represent strokes and edges represent the relationships between strokes. Then, we learn GAT models to classify graph nodes taking both stroke features and the relationships between strokes into consideration. To better exploit the spatial and temporal relationships, we enhance the original GAT model with a novel attention mechanism. Experiments on two online handwritten flowchart datasets and a finite automata dataset show that our method consistently outperforms previous methods and achieves the state-of-the-art performance.

Online recognition of sketched arrow-connected diagrams

Article

Full-text available

Sep 2016
INT J DOC ANAL RECOG

We introduce a new, online, stroke-based recognition system for hand-drawn diagrams which belong to a group of documents with an explicit structure obvious to humans but only loosely defined from the machine point of view. We propose a model for recognition by selection of symbol candidates, based on evaluation of relations between candidates using a set of predicates. It is suitable for simpler structures where the relations are explicitly given by symbols, arrows in the case of diagrams. Knowledge of a specific diagram domain is used—the two domains are flowcharts and finite automata. Although the individual pipeline steps are tailored for these, the system can readily be adapted for other domains. Our entire diagram recognition pipeline is outlined. Its core parts are text/non-text separation, symbol segmentation, their classification and structural analysis. Individual parts have been published by the authors previously and so are described briefly and referenced. Thorough evaluation on benchmark databases shows the accuracy of the system reaches the state of the art and is ready for practical use. The paper brings several contributions: (a) the entire system and its state-of-the-art performance; (b) the methodology exploring document structure when it is loosely defined; (c) the thorough experimental evaluation; (d) the new annotated database for online sketched flowcharts and finite automata diagrams.

Instance GNN: A Learning Framework for Joint Symbol Segmentation and Recognition in Online Handwritten Diagrams

Article

Jun 2021

Online handwritten diagram recognition (OHDR) has attracted considerable attention for its potential applications in many areas, but it is a challenging task due to the complex 2D structure, writing style variation, and lack of annotated data. Existing OHDR methods often have limitations in modeling and learning complex contextual relationships. To overcome these challenges, we propose an OHDR method based on graph neural networks (GNNs) in this paper. In particular, we formulate symbol segmentation and symbol recognition as node clustering and node classification problems on stroke graphs and solve the problems jointly under a unified learning framework with a GNN model. This GNN model is denoted as Instance GNN since it gives the symbol instance label as well as the semantic label. Extensive experiments on two flowchart datasets and a finite automata dataset show that our method consistently outperforms previous methods with large margins and achieves state-of-the-art performance. In addition, we release a large-scale annotated online handwritten flowchart dataset, CASIA-OHFC, and provide initial experimental results as a baseline.

Contextual Stroke Classification in Online Handwritten Documents with Edge Graph Attention Networks

Article

Full-text available

May 2020

The task of grouping strokes into different categories is an essential processing step in the automatic analysis of online handwritten documents. The technical challenge originates from the variation of the handwriting style, content heterogeneity and lack of prior layout knowledge. In this work, we propose the edge graph attention network (EGAT) to address the stroke classification problem. In this framework, the stroke classification problem is formulated as a node classification problem in a relational graph, which is constructed based on the temporal and spatial relationship of strokes. Then distributed node and edge features for classification are learned by stacking of multiple edge graph attention layers, in which various attention mechanisms are exploited to aggregate information between neighborhood nodes. In the task of text/nontext classification, the proposed model achieves accuracies 98.65% and 98.90% on the IAMOnDo and Kondate datasets, respectively. In the task of multi-class classification, the achieved accuracies are 95.81%, 97.36% and 99.05% on the IAMOnDo, FC and FA datasets, respectively. In addition, we conduct ablation experiments to quantitatively and qualitatively evaluate the key modules of our model.

The role of grouping in sketched diagram recognition

Conference Paper

Aug 2018

An early step in bottom-up diagram recognition systems is grouping ink strokes into shapes. This paper gives an overview of the key literature on automatic grouping techniques in sketch recognition. In addition, we identify the major challenges in grouping ink into identifiable shapes, discuss the common solutions to these challenges based on current research, and highlight areas for future work.

Transformer-based stroke relation encoding for online handwriting and sketches

Article

Nov 2023
PATTERN RECOGN

The OnHW Dataset: Online Handwriting Recognition from IMU-Enhanced Ballpoint Pens with Machine Learning

Article

Sep 2020

This paper presents a handwriting recognition (HWR) system that deals with online character recognition in real-time. Our sensor-enhanced ballpoint pen delivers sensor data streams from triaxial acceleration, gyroscope, magnetometer and force signals at 100 Hz. As most existing datasets do not meet the requirements of online handwriting recognition and as they have been collected using specific equipment under constrained conditions, we propose a novel online handwriting dataset acquired from 119 writers consisting of 31,275 uppercase and lowercase English alphabet character recordings (52 classes) as part of the UbiComp 2020 Time Series Classification Challenge. Our novel OnHW-chars dataset allows for the evaluations of uppercase, lowercase and combined classification tasks, on both writer-dependent (WD) and writer-independent (WI) classes and we show that properly tuned machine learning pipelines as well as deep learning classifiers (such as CNNs, LSTMs, and BiLSTMs) yield accuracies up to 90 % for the WD task and 83 % for the WI task for uppercase characters. Our baseline implementations together with the rich and publicly available OnHW dataset serve as a baseline for future research in that area.

Interactive Sketch Recognition Framework for Geometric Shapes: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13–16, 2018, Proceedings, Part V

Chapter

Full-text available

Nov 2018

Recognition System for On-Line Sketched Diagrams

Conference Paper

Full-text available

Sep 2014

We present our recent model of a diagram recognition engine. It extends our previous work which approaches the structural recognition as an optimization problem of choosing the best subset of symbol candidates. The main improvement is the integration of our own text separator into the pipeline to deal with text blocks occurring in diagrams. Second improvement is splitting the symbol candidates detection into two stages: uniform symbols detection and arrows detection. Text recognition is left for postprocessing when the diagram structure is already known. Training and testing of the engine was done on a freely available benchmark database of flowcharts. We correctly segmented and recognized 93.0% of the symbols having 55.1% of the diagrams recognized without any error. Considering correct stroke labeling, we achieved the precision of 95.7 %. This result is superior to the state-of-the-art method with the precision of 92.4 %. Additionally, we demonstrate the generality of the proposed method by adapting the system to finite automata domain and evaluating it on own database of such diagrams.

Modeling Flowchart Structure Recognition as a Max-Sum Problem

Conference Paper

Full-text available

Aug 2013

This work deals with the on-line recognition of hand-drawn graphical sketches with structure. We present a novel approach, in which the search for a suitable interpretation of the input is formulated as a combinatorial optimization task - the max-sum problem. The recognition pipeline consists of two main stages. First, groups of strokes possibly representing symbols of a sketch (symbol candidates) are segmented and relations between them are detected. Second, a combination of symbol candidates best fitting the input is chosen by solving the optimization problem. We focused on flowchart recognition. Training and testing of our method was done on a freely available benchmark database. We correctly segmented and recognized 82.7% of the symbols having 31.5% of the diagrams recognized without any error. It indicates that our approach has promising potential and can compete with the state-of-the-art methods.

Fusion of Statistical and Structural Information for Flowchart Recognition

Conference Paper

Full-text available

Aug 2013

A critical step of on-line handwritten diagram recognition is the segmentation between text and symbols. It is still an open problem in several approaches of the literature. However, for a human operator, text/symbol segmentation is an easy task and does not even need understanding diagram semantics. It is done thanks to the use of both structural knowledge and statistical analysis. A human operator knows what is a symbol and how to distinguish a good symbol from a bad one in a list of candidates. We propose to reproduce this perceptive mechanism by introducing some statistical information inside of a grammatical method for document structure recognition, in order to combine both structural an statistical knowledge. This approach is applied to flowchart recognition on a freely available database. The results demonstrate the interest of combining statistical and structural information for perceptive vision in diagram recognition.

Fuzzy Relative Positioning for On-Line Handwritten Stroke Analysis

Article

Full-text available

Oct 2006

This paper deals with the qualitative and robust mod-elling of the relative positioning of on-line handwritten strokes. We exploit the fuzzy approach to take the impre-cision of such relations into account. We first transpose a well-formalized method which proved itself in the domain of image analysis to the on-line case; it aims at evaluat-ing the relation "to be in a given direction" relatively to a reference. Our first contribution is a solution to deal with the particular nature of on-line strokes, which are consti-tuted of non-connected points. Our second and main con-tribution is a method to learn automatically fuzzy relative position relationships. It aims at evaluating the relation "to be in a given position" relatively to a reference using jointly the direction and the distance. We test the impact of this new fuzzy positioning approach on one possible appli-cation: the recognition of handwritten graphic gestures, which requires spatial context information to be discrim-inated. Whereas the recognition rate is 52.95% without any spatial information, we obtain a maximum of 95.75% when we use learnt relative position relationships.

Recognition of On-Line Handwritten Commutative Diagrams

Conference Paper

Full-text available

Aug 2009

We present a method for the recognition of on-line hand-written commutative diagrams. Diagrams are formed with arrows that join relatively simple mathematical expressions. Diagram recognition consists in grouping isolated symbols into simple expressions, recognizing the arrows that join such expressions, and finding the layout that best describes the diagram. We model the layout of the diagram with a grid that optimally fits a tabular arrangement of expressions. Our method maximizes a linear function that mea-sures the quality of our layout model. The recognition results are translated into the LaTeX library xy-pic.

Fuzzy Relative Positioning Templates for Symbol Recognition

Conference Paper

Full-text available

Oct 2011

Relative positioning between components of a structured object plays a key role for its interpretation. Fuzzy relative positioning templates are a description framework for 2D handwritten patterns, that is based on positioning models specifically designed for dealing with variability and imprecision of handwriting. In this work, we present fuzzy positioning templates and investigate the idea of recognizing structured handwritten symbols by considering the relative positioning of the components, rather than the shapes of the components themselves or the global shape of the symbol. The templates are automatically trained from data without requiring any prior knowledge. Experiments on a database of on-line symbols prove that this original strategy is a promising approach for interpretation of structured patterns.

JANNLab Neural Network Framework for Java

Conference Paper

Jul 2013

Sort, merge, repeat

Conference Paper

Aug 2009

Free-sketch recognition systems attempt to recognize freely-drawn sketches without placing stylistic constraints on the users. Such systems often recognize shapes by using geometric primitives that describe the shape's appearance rather than how it was drawn. A free-sketch recognition system necessarily allows users to draw several primitives using a single stroke. Corner finding, or vertex detection, is used to segment these strokes into their underlying primitives (lines and arcs), which in turn can be passed to the geometric recognizers. In this paper, we present a new multi-pass corner finding algorithm called MergeCF that is based on continually merging smaller stroke segments with similar, larger stroke segments in order to eliminate false positive corners. We compare MergeCF to two benchmark corner finders with substantial improvements in corner detection.

Local Feature Based Online Mode Detection with Recurrent Neural Networks

Conference Paper

Sep 2011

In this paper we propose a novel approach for online mode detection, where the task is to classify ink traces into several categories. In contrast to previous approaches working on global features, we introduce a system completely relying on local features. For classification, standard recurrent neural networks (RNNs) and the recently introduced long short-term memory (LSTM) networks are used. Experiments are performed on the publicly available IAMonDo-database which serves as a benchmark data set for several researches. In the experiments we investigate several RNN structures and classification sub-tasks of different complexities. The final recognition rate on the complete test set is 98.47% in average, which is significantly higher than the 97% achieved with an MCS in previous work. Further interesting results on different subsets are also reported in this paper.

Sketch recognition using particle swarm algorithms

Conference Paper

Dec 2009
Image Process

Sketch recognition is defined as the process of identifying symbols that users draw using single or multiple strokes. Users draw strokes using a pen and the system immediately interprets their strokes as objects that can be easily manipulated. This paper uses Particle Swarm Optimization Algorithm (PSO) to divide the strokes the user draws into meaningful geometric primitives. These geometric primitives are grouped to formulate symbols which are further identified. The results show that using PSO improves segmentation results which guide the symbol recognition phase. This paper uses Support Vector Machines (SVM) classifier which further improves the final recognition accuracy.

Detection of Arrows in On-Line Sketched Diagrams Using Relative Stroke Positioning

Abstract and Figures

Recommended publications

Modeling Flowchart Structure Recognition as a Max-Sum Problem

Aligning shapes for symbol classification and retrieval

An Improved Approach Based on CNN-RNNs for Mathematical Expression Recognition

Optical Music Recognition in Mensural Notation with Region-Based Convolutional Neural Networks