A preview of this full-text is provided by Springer Nature.
Content available from International Journal of Multimedia Information Retrieval
This content is subject to copyright. Terms and conditions apply.
Int J Multimed Info Retr (2017) 6:31–47
DOI 10.1007/s13735-016-0110-y
TRENDS AND SURVEYS
A Survey on handwritten documents word spotting
Rashad Ahmed1,2·Wasfi G. Al-Khatib1·Sabri Mahmoud1
Received: 10 August 2016 / Revised: 6 September 2016 / Accepted: 19 September 2016 / Published online: 15 October 2016
© Springer-Verlag London 2016
Abstract Along with the explosive growth of the amount
of handwritten documents that are preserved, processed and
accessed in a digital form, handwritten document images
word spotting has attracted many researchers of various
research communities, such as pattern recognition, computer
vision and information retrieval. Work on the problem of
handwritten documents word spotting has been an active
research area and significant progress has been made in
the last few years. However, in spite of the great progress
achieved, handwritten document word spotting still can
hardly achieve acceptable performance on real-world hand-
written document images that vary widely in writing style and
quality. This paper gives an overview of published research
efforts in the area of handwritten document image word spot-
ting and on the technologies used in the field. We first start
by describing a general model for document word spotting
followed by discussing present challenges in handwritten
document word spotting. Then the used databases for hand-
written document word spotting and other handwritten text
tasks are discussed. After that, research works on handwrit-
ten document word spotting are presented. Finally, several
summary tables of published research work are provided
for used handwritten documents databases and reported per-
formance results on handwritten documents word spotting.
BRashad Ahmed
othmanr@kfupm.edu.sa
Wasfi G. Al-Khatib
wasfi@kfupm.edu.sa
Sabri Mahmoud
smasaad@kfupm.edu.sa
1ICS Department, King Fahd University of Petroleum and
Minerals, Dhahran 31261, Saudi Arabia
2CS Department, Taiz University, Taiz, Yemen
These tables summarize different aspects and the reported
accuracy for each technique.
Keywords Word spotting ·Content-based image retrieval
(CBIR) ·Documents indexing ·Documents retrieval ·
Historical documents word spotting
1 Introduction
Due to advances in information technology and communi-
cation, recent years have witnessed a dramatic growth of
the amount of handwritten documents that are preserved,
processed and accessed in a digital form. Historical docu-
ments as a subset of handwritten documents are valuable
resources for scholars so their contents can be made available
via the internet or other electronic media. The main problem
is that such contents are only available in image formats,
which makes them difficult to search. In this case, document
image word spotting techniques can be used to search the
textual information from the digitized document images and
make this information accessible to users. Word spotting is
the task of locating specific words in a collection of document
images. There are two principal approaches to spot docu-
ment images. The first category is the traditional text search
methods which necessitates efficient optical character recog-
nition techniques (OCR), which are right now unavailable
for most of handwritten documents [1–3]. These methods
are referred to as OCR-based techniques. OCR-based tech-
niques are not proper choice for handwritten documents. This
is due to the presence of several challenges in handwritten
documents including: (1) poor quality documents, (2) writ-
ing style variability, (3) multiple writing styles and (4) word
writing variations, etc. The second category is to use word
spotting techniques [4] to search in the image domain. In
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.