ArticlePublisher preview available

A Survey on handwritten documents word spotting

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Along with the explosive growth of the amount of handwritten documents that are preserved, processed and accessed in a digital form, handwritten document images word spotting has attracted many researchers of various research communities, such as pattern recognition, computer vision and information retrieval. Work on the problem of handwritten documents word spotting has been an active research area and significant progress has been made in the last few years. However, in spite of the great progress achieved, handwritten document word spotting still can hardly achieve acceptable performance on real-world handwritten document images that vary widely in writing style and quality. This paper gives an overview of published research efforts in the area of handwritten document image word spotting and on the technologies used in the field. We first start by describing a general model for document word spotting followed by discussing present challenges in handwritten document word spotting. Then the used databases for handwritten document word spotting and other handwritten text tasks are discussed. After that, research works on handwritten document word spotting are presented. Finally, several summary tables of published research work are provided for used handwritten documents databases and reported performance results on handwritten documents word spotting. These tables summarize different aspects and the reported accuracy for each technique.
This content is subject to copyright. Terms and conditions apply.
Int J Multimed Info Retr (2017) 6:31–47
DOI 10.1007/s13735-016-0110-y
TRENDS AND SURVEYS
A Survey on handwritten documents word spotting
Rashad Ahmed1,2·Wasfi G. Al-Khatib1·Sabri Mahmoud1
Received: 10 August 2016 / Revised: 6 September 2016 / Accepted: 19 September 2016 / Published online: 15 October 2016
© Springer-Verlag London 2016
Abstract Along with the explosive growth of the amount
of handwritten documents that are preserved, processed and
accessed in a digital form, handwritten document images
word spotting has attracted many researchers of various
research communities, such as pattern recognition, computer
vision and information retrieval. Work on the problem of
handwritten documents word spotting has been an active
research area and significant progress has been made in
the last few years. However, in spite of the great progress
achieved, handwritten document word spotting still can
hardly achieve acceptable performance on real-world hand-
written document images that vary widely in writing style and
quality. This paper gives an overview of published research
efforts in the area of handwritten document image word spot-
ting and on the technologies used in the field. We first start
by describing a general model for document word spotting
followed by discussing present challenges in handwritten
document word spotting. Then the used databases for hand-
written document word spotting and other handwritten text
tasks are discussed. After that, research works on handwrit-
ten document word spotting are presented. Finally, several
summary tables of published research work are provided
for used handwritten documents databases and reported per-
formance results on handwritten documents word spotting.
BRashad Ahmed
othmanr@kfupm.edu.sa
Wasfi G. Al-Khatib
wasfi@kfupm.edu.sa
Sabri Mahmoud
smasaad@kfupm.edu.sa
1ICS Department, King Fahd University of Petroleum and
Minerals, Dhahran 31261, Saudi Arabia
2CS Department, Taiz University, Taiz, Yemen
These tables summarize different aspects and the reported
accuracy for each technique.
Keywords Word spotting ·Content-based image retrieval
(CBIR) ·Documents indexing ·Documents retrieval ·
Historical documents word spotting
1 Introduction
Due to advances in information technology and communi-
cation, recent years have witnessed a dramatic growth of
the amount of handwritten documents that are preserved,
processed and accessed in a digital form. Historical docu-
ments as a subset of handwritten documents are valuable
resources for scholars so their contents can be made available
via the internet or other electronic media. The main problem
is that such contents are only available in image formats,
which makes them difficult to search. In this case, document
image word spotting techniques can be used to search the
textual information from the digitized document images and
make this information accessible to users. Word spotting is
the task of locating specific words in a collection of document
images. There are two principal approaches to spot docu-
ment images. The first category is the traditional text search
methods which necessitates efficient optical character recog-
nition techniques (OCR), which are right now unavailable
for most of handwritten documents [13]. These methods
are referred to as OCR-based techniques. OCR-based tech-
niques are not proper choice for handwritten documents. This
is due to the presence of several challenges in handwritten
documents including: (1) poor quality documents, (2) writ-
ing style variability, (3) multiple writing styles and (4) word
writing variations, etc. The second category is to use word
spotting techniques [4] to search in the image domain. In
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Despite these challenges, proposed approaches provide efficient support for constrained documents, ensuring efficient retrieval from vocabulary. This review [16] explores various word spotting techniques, categorizing them based on extracted features. Profilebased features capture word outline, with Manmatha and Croft achieving an average precision rate of 72.65%. ...
Article
This descriptive abstract summarizes a thorough examination into the use of smart technology for answer sheet evaluation. The study explores how to automate the grading process using Artificial Intelligence, Machine Learning and other algorithms to improve efficiency and objectivity while evaluating student responses. Examined are several smart assessment systems, stressing attributes such as adaptive learning processes, pattern recognition and natural language processing. The abstract delves into the possible advantages, obstacles and ramifications linked to the implementation of intelligent response sheet assessment techniques in educational environments. The abstract offers insights into the changing landscape of assessment methodologies through a synthesis of recent research findings, illuminating the revolutionary potential of intelligent systems in reshaping education in the future
... For decades, researchers used different techniques in each step. [1][2][3][4] In recent decades, many techniques have been applied to Arabic HR, such as scale-invariant feature transform (SIFT), 5 hidden Markov models (HMM), support vector machines (SVMs), and K-nearest neighbors (KNNs). Chergui and Kef 6 used the IFN/ENIT database to test their novel approach based on the SIFT descriptor and achieved a recognition rate of 90.61%. ...
... On the information retrieval and keyword spotting front, there are a plethora of works dealing with handwritten document indexing and retrieval [26,22,8,2,57]. One relevant example is the ImageCLEF 2016 Handwritten Scanned Document Retrieval challenge [57], aimed at developing retrieval systems for handwritten documents. Although there are some similarities between the ImageCLEF 2016 document retrieval challenge and the QA on handwritten documents proposed in this paper-their queries have multiple words (like our questions) and the retrieval instance is a document segment/snippet-the task of document retrieval which they address differs clearly from the proposed QA in the following aspects: (i) queries in their case are not natural language questions but search queries having multiple tokens and (ii) their task requires that all the tokens in the input query appear in the same order in the retrieved document snippet. ...
Preprint
Full-text available
This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org
... The digitisation makes the manuscripts available to a wider audience, and preserves the cultural heritage. The automatic recognition of textual corpora and named entities generated from medieval and early-modern manuscript sources with high accuracy is a challenge [2,20,22]. Manuscript images are often processed through keyword spotting or word recognition to be accessed and searched, such as [4,8,14,17] and [18]. There are some papers build a search system for handwritten images, such as [1,5,15,16,21] and [23]. ...
Chapter
Full-text available
A very large number of historical manuscript collections are available in image formats and require extensive manual processing in order to search through them. So, we propose and build a search engine for automatically storing, indexing and efficiently searching the manuscript images. Firstly, a handwritten text recognition technique is used to convert the images into textual representations. In the next steps, we apply the named entity recognition and historical knowledge graph to build a semantic search model, which can understand the user’s intent in the query and the contextual meaning of concepts in documents, to return correctly the transcriptions and their corresponding images for users.
Preprint
Full-text available
In the realm of data analysis and document processing, the recognition of handwritten numerals stands as a pivotal advancement. This contribution has steered transformative shifts in optical character recognition, historical handwritten document analysis, and postal automation. A persistent challenge in this arena is the recognition of handwritten digits across a spectrum of languages, each with its idiosyncrasies. We present an innovative paradigm to surmount this hurdle, transcending the confines of monolingual recognition. Unlike the status quo, which gravitates toward a narrow subset of languages, our method orchestrates a comprehensive solution spanning 12 distinct languages, deftly navigating linguistic intricacies. The catalyst for this efficacy is transfer learning, amplifying image quality and recognition acumen. Emboldening this framework is an ingenuity-charged attention-based module that refines precision. Our rigorous experimentations substantiate quantum leaps in image quality and the prowess of linguistic and numeral recognition. Notably, we unearth significant accuracy strides, eclipsing 2% enhancements in specific languages vis-à-vis antecedent methodologies. This endeavor epitomizes a sturdy, economically sound avenue, unshackling multilingual handwritten numeral recognition to an expansive spectrum of languages.
Chapter
This paper aims to inspect the often neglected role of Graphical User Interfaces (GUI) in AI-based tools designed to assist in the transcription of handwritten documents. While the precision and recall of the handwritten word recognition have traditionally been the primary focus, we argue that the time parameter associated with the GUI, specifically in terms of validation and correction, plays an equally crucial role. By investigating the influence of GUI design on the validation and correction aspects of transcription we want to highlight how the time that the user must take to interact with the interface must be taken into account to evaluate the performance of the transcription process. Through comprehensive analysis and experimentation, we illustrate the profound impact that GUI design can have on the overall efficiency of transcription tools. We demonstrate how the time saved through the utilization of an assistant tool is heavily dependent on the operations performed within the interface and the diverse features it offers. By recognizing GUI design as an essential component of transcription tools, we can unlock their full potential and significantly improve their effectiveness.
Chapter
We address the problem of estimating the tradeoff between the size of the training set and the performance of a KWS when used to assist the transcription of small collections of historical handwritten documents. As this application domain is characterized by a lack of data, and techniques such as transfer learning and data augmentation require more resources than those that are commonly available in the organizations holding the collections, we address the problem of getting the best out of the available data. For this purpose, we reformulate the problem as that of finding the size of the training set leading to a KWS whose performance, when used to support the transcription, allows to obtain the largest reduction of the human efforts to achieve the complete transcription of the collection. The results of a large set of experiments on three publicly available datasets largely adopted as a benchmark for performance evaluation show that a training set made of 5 to 8 pages is enough for achieving the largest reduction, independently of the actual pages included in the training set and the corresponding keyword lists. They also show that the actual time reduction depends much more on the keyword list than on the KWS performance.
Chapter
Handwritten documents from communities like cultural heritage, judiciary, and modern journals remain largely unexplored even today. To a great extent, this is due to the lack of retrieval tools for such unlabeled document collections. This work considers such collections and presents a simple, robust retrieval framework for easy information access. We achieve retrieval on unlabeled novel collections through invariant features learned for handwritten text. These feature representations enable zero-shot retrieval for novel queries on unlabeled collections. We improve the framework further by supporting search via text and exemplar queries. Four new collections written in English, Malayalam, and Bengali are used to evaluate our text retrieval framework. These collections comprise 2957 handwritten pages and over 300K words. We report promising results on these collections, despite the zero-shot constraint and huge collection size. Our framework allows the addition of new collections without any need for specific finetuning or labeling. Finally, we also present a demonstration of the retrieval framework. [Project Page].KeywordsDocument retrievalKeyword spottingZero-shot retrieval
Chapter
Currently, deep learning techniques have become the core of recent research in pattern recognition domain and especially for the handwriting recognition field where the challenges for the Arabic language are stilling. Despite their high importance and performances, for the best of our acknowledge, deep learning techniques have not been investigated in the context of Arabic handwritten literal amount recognition. The main aim of this paper is to investigate the effect of several Convolutional Neural Networks CNNs based on the proposed architecture with regularization parameters for such context. To achieve this aim, the AHDB database was used where very promising results were obtained outperforming the previous works on this database.KeywordsArabic handwritingLiteral amount recognitionOffline recognitionDeep learningResnetVGG