Conference PaperPDF Available

Word Spotting using Radial Descriptor

Authors:
A preview of the PDF is not available
... Recently, we presented the Radial Descriptor [1], and studied its application for spotting keywords in Arabic historical documents using the bag-of-features model. The radial descriptor describes the neighborhood of a feature point in a compact manner. ...
... To build the Radial Descriptor Graph for an input image, I, we compute the radial descriptor [1] features for I and select the feature points with highest variance. We generate the initial graph using these feature points, and iteratively merge operation to reduce the size of the graph and remove redundant features. ...
... Then, Dynamic Time Warping (DTW) is used to match features. Kassis et al. [21], [22] proposed an approach composed of two steps: i) word-parts feature extraction using the radial descriptor and ii) word-parts matching where the distance between their occurrence probability histograms is computed. In [5], the word spotting task is formulated as a classification problem. ...
Conference Paper
Full-text available
Word Spotting of Historical Arabic Documents is a challenging task due to the complexity of document layouts. This paper proposes a novel word spotting approach that consists of learning feature representation to describe word images. The objective is to investigate optimal embedding spaces to extract a discriminative word image representation. The proposed approach consists of two steps: i) construct a CNN-based embedding space with triplet-loss and then ii) match embedding representations using the Euclidean distance. For training, the CNN takes as input a set of triplet samples (anchor, positive sample and negative sample). Then, the triplet loss serves to create a novel space by minimizing intra-classes distances and maximizing inter-classes distances. The proposed approach is evaluated on the VML-HD dataset and the experiments show its effectiveness compared to the state of the art.
... We've ran a couple of experiments on the new dataset, using the Radial Descriptor [16] as well as the Radial Descriptor Graph [17]. First, we ran the algorithm on a sub-set of the dataset that contains 21 kinds of sub-words of 100 appearances each, totaling 2, 100 sub-words, for each books. ...
... Most works in Arabic scripts [71,72,109,135] are only able to perform on partial word level. Pieces of Arabic Words (PAW) are obtained either manually or from connected component analysis on the segmented words. ...
Article
Full-text available
Vast collections of documents available in image format need to be indexed for information retrieval purposes. In this framework, word spotting is an alternative solution to optical character recognition (OCR), which is rather inefficient for recognizing text of degraded quality and unknown fonts usually appearing in printed text, or writing style variations in handwritten documents. Over the past decade there has been a growing interest in addressing document indexing using word spotting which is reflected by the continuously increasing number of approaches. However, there exist very few comprehensive studies which analyze the various aspects of a word spotting system. This work aims to review the recent approaches as well as fill the gaps in several topics with respect to the related works. The nature of texts and inherent challenges addressed by word spotting methods are thoroughly examined. After presenting the core steps which compose a word spotting system, we investigate the use of retrieval enhancement techniques based on relevance feedback which improve the retrieved results. Finally, we present the datasets which are widely used for word spotting, we describe the evaluation standards and measures applied for performance assessment and discuss the results achieved by the state of the art.
Article
From the early days of pattern recognition, word spotting have been important test beds for studying how well machines can perform better decision making. In recent years, word spotting have made dramatic advances with state-of-the-art techniques reaching high level of performance in real life applications. This word spotting domain have driven research by providing suitable yet well-defined challenges for pattern recognition and document analysis practitioners. We continue in this direction by covering extensive literature and new challenges in this domain with comparison of previous work. In particular, we have covered recent deep learning technique role in word spotting and future scope of word spotting with deep learning. We believe writing suitable review of word spotting will not only be crucial for understanding of this field in today era, but also in broader collaborative efforts, especially those with artificial intelligence based tasks. To facilitate future research in word spotting, we have discussed word spotting from learning environment, including its framework design with components as query phase, preprocessing stages, segmentation, feature extraction, feature representation and matching process strategies. Further, deep learning working and use in word spotting architecture has been discussed. The study also include an experimental comparison for the research community to evaluate algorithmic advances along with benchmarked datasets, and future challenges in this field.
ResearchGate has not been able to resolve any references for this publication.