ArticlePublisher preview available

A Survey on handwritten documents word spotting

March 2017
International Journal of Multimedia Information Retrieval 6(1)

March 2017
6(1)

DOI:10.1007/s13735-016-0110-y

Authors:

Rashad Ahmed

King Fahd University of Petroleum and Minerals

Wasfi G. Al-Khatib

King Fahd University of Petroleum and Minerals

Sabri Mahmoud

King Fahd University of Petroleum and Minerals

Along with the explosive growth of the amount of handwritten documents that are preserved, processed and accessed in a digital form, handwritten document images word spotting has attracted many researchers of various research communities, such as pattern recognition, computer vision and information retrieval. Work on the problem of handwritten documents word spotting has been an active research area and significant progress has been made in the last few years. However, in spite of the great progress achieved, handwritten document word spotting still can hardly achieve acceptable performance on real-world handwritten document images that vary widely in writing style and quality. This paper gives an overview of published research efforts in the area of handwritten document image word spotting and on the technologies used in the field. We first start by describing a general model for document word spotting followed by discussing present challenges in handwritten document word spotting. Then the used databases for handwritten document word spotting and other handwritten text tasks are discussed. After that, research works on handwritten document word spotting are presented. Finally, several summary tables of published research work are provided for used handwritten documents databases and reported performance results on handwritten documents word spotting. These tables summarize different aspects and the reported accuracy for each technique.

Word spotting vs word recognition

…

Figures - available from: International Journal of Multimedia Information Retrieval

This content is subject to copyright. Terms and conditions apply.

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from International Journal of Multimedia Information Retrieval

This content is subject to copyright. Terms and conditions apply.

Int J Multimed Info Retr (2017) 6:31–47

DOI 10.1007/s13735-016-0110-y

TRENDS AND SURVEYS

A Survey on handwritten documents word spotting

Rashad Ahmed1,2·Wasﬁ G. Al-Khatib1·Sabri Mahmoud1

Received: 10 August 2016 / Revised: 6 September 2016 / Accepted: 19 September 2016 / Published online: 15 October 2016

Abstract Along with the explosive growth of the amount

of handwritten documents that are preserved, processed and

accessed in a digital form, handwritten document images

word spotting has attracted many researchers of various

research communities, such as pattern recognition, computer

vision and information retrieval. Work on the problem of

handwritten documents word spotting has been an active

research area and signiﬁcant progress has been made in

the last few years. However, in spite of the great progress

achieved, handwritten document word spotting still can

hardly achieve acceptable performance on real-world hand-

written document images that vary widely in writing style and

quality. This paper gives an overview of published research

efforts in the area of handwritten document image word spot-

ting and on the technologies used in the ﬁeld. We ﬁrst start

by describing a general model for document word spotting

followed by discussing present challenges in handwritten

document word spotting. Then the used databases for hand-

written document word spotting and other handwritten text

tasks are discussed. After that, research works on handwrit-

ten document word spotting are presented. Finally, several

summary tables of published research work are provided

for used handwritten documents databases and reported per-

formance results on handwritten documents word spotting.

BRashad Ahmed

othmanr@kfupm.edu.sa

Wasﬁ G. Al-Khatib

wasﬁ@kfupm.edu.sa

Sabri Mahmoud

smasaad@kfupm.edu.sa

1ICS Department, King Fahd University of Petroleum and

Minerals, Dhahran 31261, Saudi Arabia

2CS Department, Taiz University, Taiz, Yemen

These tables summarize different aspects and the reported

accuracy for each technique.

Keywords Word spotting ·Content-based image retrieval

(CBIR) ·Documents indexing ·Documents retrieval ·

Historical documents word spotting

1 Introduction

Due to advances in information technology and communi-

cation, recent years have witnessed a dramatic growth of

the amount of handwritten documents that are preserved,

processed and accessed in a digital form. Historical docu-

ments as a subset of handwritten documents are valuable

resources for scholars so their contents can be made available

via the internet or other electronic media. The main problem

is that such contents are only available in image formats,

which makes them difﬁcult to search. In this case, document

image word spotting techniques can be used to search the

textual information from the digitized document images and

make this information accessible to users. Word spotting is

the task of locating speciﬁc words in a collection of document

images. There are two principal approaches to spot docu-

ment images. The ﬁrst category is the traditional text search

methods which necessitates efﬁcient optical character recog-

nition techniques (OCR), which are right now unavailable

for most of handwritten documents [1–3]. These methods

are referred to as OCR-based techniques. OCR-based tech-

niques are not proper choice for handwritten documents. This

is due to the presence of several challenges in handwritten

documents including: (1) poor quality documents, (2) writ-

ing style variability, (3) multiple writing styles and (4) word

writing variations, etc. The second category is to use word

spotting techniques [4] to search in the image domain. In

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Review on Smart Evaluation of Descriptive Answer Sheets

Article

Feb 2024

This descriptive abstract summarizes a thorough examination into the use of smart technology for answer sheet evaluation. The study explores how to automate the grading process using Artificial Intelligence, Machine Learning and other algorithms to improve efficiency and objectivity while evaluating student responses. Examined are several smart assessment systems, stressing attributes such as adaptive learning processes, pattern recognition and natural language processing. The abstract delves into the possible advantages, obstacles and ramifications linked to the implementation of intelligent response sheet assessment techniques in educational environments. The abstract offers insights into the changing landscape of assessment methodologies through a synthesis of recent research findings, illuminating the revolutionary potential of intelligent systems in reshaping education in the future

Improving end-to-end deep learning methods for Arabic handwriting recognition

Article

Dec 2022
J ELECTRON IMAGING

Asking questions on handwritten document collections

Preprint

Full-text available

Oct 2021

This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org

A Semantic Search Engine for Historical Handwritten Document Images

Chapter

Full-text available

Sep 2021

A very large number of historical manuscript collections are available in image formats and require extensive manual processing in order to search through them. So, we propose and build a search engine for automatically storing, indexing and efficiently searching the manuscript images. Firstly, a handwritten text recognition technique is used to convert the images into textual representations. In the next steps, we apply the named entity recognition and historical knowledge graph to build a semantic search model, which can understand the user’s intent in the query and the contextual meaning of concepts in documents, to return correctly the transcriptions and their corresponding images for users.

Unveiling Cross-Linguistic Mastery: Advancing Multilingual Handwritten Numeral Recognition with Attention-driven Transfer Learning

Preprint

Full-text available

Oct 2023

In the realm of data analysis and document processing, the recognition of handwritten numerals stands as a pivotal advancement. This contribution has steered transformative shifts in optical character recognition, historical handwritten document analysis, and postal automation. A persistent challenge in this arena is the recognition of handwritten digits across a spectrum of languages, each with its idiosyncrasies. We present an innovative paradigm to surmount this hurdle, transcending the confines of monolingual recognition. Unlike the status quo, which gravitates toward a narrow subset of languages, our method orchestrates a comprehensive solution spanning 12 distinct languages, deftly navigating linguistic intricacies. The catalyst for this efficacy is transfer learning, amplifying image quality and recognition acumen. Emboldening this framework is an ingenuity-charged attention-based module that refines precision. Our rigorous experimentations substantiate quantum leaps in image quality and the prowess of linguistic and numeral recognition. Notably, we unearth significant accuracy strides, eclipsing 2% enhancements in specific languages vis-à-vis antecedent methodologies. This endeavor epitomizes a sturdy, economically sound avenue, unshackling multilingual handwritten numeral recognition to an expansive spectrum of languages.

The Neglected Role of GUI in Performance Evaluation of AI-Based Transcription Tools for Handwritten Documents

Chapter

Oct 2023

This paper aims to inspect the often neglected role of Graphical User Interfaces (GUI) in AI-based tools designed to assist in the transcription of handwritten documents. While the precision and recall of the handwritten word recognition have traditionally been the primary focus, we argue that the time parameter associated with the GUI, specifically in terms of validation and correction, plays an equally crucial role. By investigating the influence of GUI design on the validation and correction aspects of transcription we want to highlight how the time that the user must take to interact with the interface must be taken into account to evaluate the performance of the transcription process. Through comprehensive analysis and experimentation, we illustrate the profound impact that GUI design can have on the overall efficiency of transcription tools. We demonstrate how the time saved through the utilization of an assistant tool is heavily dependent on the operations performed within the interface and the diverse features it offers. By recognizing GUI design as an essential component of transcription tools, we can unlock their full potential and significantly improve their effectiveness.

Estimating the Optimal Training Set Size of Keyword Spotting for Historical Handwritten Document Transcription

Chapter

Oct 2023

We address the problem of estimating the tradeoff between the size of the training set and the performance of a KWS when used to assist the transcription of small collections of historical handwritten documents. As this application domain is characterized by a lack of data, and techniques such as transfer learning and data augmentation require more resources than those that are commonly available in the organizations holding the collections, we address the problem of getting the best out of the available data. For this purpose, we reformulate the problem as that of finding the size of the training set leading to a KWS whose performance, when used to support the transcription, allows to obtain the largest reduction of the human efforts to achieve the complete transcription of the collection. The results of a large set of experiments on three publicly available datasets largely adopted as a benchmark for performance evaluation show that a training set made of 5 to 8 pages is enough for achieving the largest reduction, independently of the actual pages included in the training set and the corresponding keyword lists. They also show that the actual time reduction depends much more on the keyword list than on the KWS performance.

Handwritten Text Retrieval from Unlabeled Collections

Chapter

Jan 2022

Handwritten documents from communities like cultural heritage, judiciary, and modern journals remain largely unexplored even today. To a great extent, this is due to the lack of retrieval tools for such unlabeled document collections. This work considers such collections and presents a simple, robust retrieval framework for easy information access. We achieve retrieval on unlabeled novel collections through invariant features learned for handwritten text. These feature representations enable zero-shot retrieval for novel queries on unlabeled collections. We improve the framework further by supporting search via text and exemplar queries. Four new collections written in English, Malayalam, and Bengali are used to evaluate our text retrieval framework. These collections comprise 2957 handwritten pages and over 300K words. We report promising results on these collections, despite the zero-shot constraint and huge collection size. Our framework allows the addition of new collections without any need for specific finetuning or labeling. Finally, we also present a demonstration of the retrieval framework. [Project Page].KeywordsDocument retrievalKeyword spottingZero-shot retrieval

Team of Tiny ANNs: A Way Towards Cost-Efficient Scalable Deep Learning

Conference Paper

Mar 2022

Recognizing Arabic Handwritten Literal Amount Using Convolutional Neural Networks

Chapter

Jan 2022

Currently, deep learning techniques have become the core of recent research in pattern recognition domain and especially for the handwriting recognition field where the challenges for the Arabic language are stilling. Despite their high importance and performances, for the best of our acknowledge, deep learning techniques have not been investigated in the context of Arabic handwritten literal amount recognition. The main aim of this paper is to investigate the effect of several Convolutional Neural Networks CNNs based on the proposed architecture with regularization parameters for such context. To achieve this aim, the AHDB database was used where very promising results were obtained outperforming the previous works on this database.KeywordsArabic handwritingLiteral amount recognitionOffline recognitionDeep learningResnetVGG