Figure 17 - uploaded by Asma Saïdani
Content may be subject to copyright.
Upper Lower profile. 

Upper Lower profile. 

Source publication
Article
Full-text available
This paper gathers some contributions to script and its nature identification. Different sets of features have been employed successfully for discriminating between handwritten and machine-printed Arabic and Latin scripts. They include some well established features, previously used in the literature, and new structural features which are intrinsic...

Context in source publication

Context 1
... lower profile : As noted in [8], where authors tried to discriminate machine-printed from handwritten Latin text, using simple structural characteristics, the height of printed characters is more or less stable within text-line. On the other hand, the distribution of the height of handwritten characters is quite diverse. These remarks stand also for the height of the main body of the character as well as the height of both ascenders and descenders. Thus the ratio of ascender height to main body’s height and the ratio of descender’s height to main body’s height would be stable in printed text and variable in handwriting. To characterize a word, based on its upper lower profile, we extracted the following features: R 1 (the ratio of ascender zone to the main body zone), R 2 (the ratio of descender zone to the main body zone) and R 3 (the ratio of the area to the maximum value of the horizontal histogram of the upper-lower profile). Fig. 17 gives an example of these features computing on machine-printed Arabic word. Notice that connected components of diacritic points are not considered in the analysis of the upper lower ...

Similar publications

Article
Full-text available
Gender Identification and Age estimation is an important topic in the field of Automatic Speech Recognition (ASR) systems. In the field of robotics, for example, it is important to identify human sex and age for emotion recognition and robot interaction. In this paper, we targeted Arabic speakers by identifying their genders and estimating their ag...
Article
Full-text available
In this work, a secure multibiometric system is proposed. Three different biometric modalities which are ear, face, and thermal face are considered. The face and thermal face data were taken from USTC NVIE Spontaneous Database, whereas the ear data were collected from IIT Delhi Ear Image Database. For each modality, three feature extraction methods...
Article
Full-text available
Introduction: The use of computed tomography (CT) scan is essential for making diagnoses for trauma patients in emergency medicine. Numerous studies have been conducted on guiding medical examinations in light of advances in machine learning, leading to more accurate and rapid diagnoses. The present study aims to propose a machine learning-based me...
Article
Full-text available
Sentiment Analysis (SA) is one of hottest fields in data mining (DM) and natural language processing (NLP). The goal of SA is to extract the sentiment conveyed in a certain text based on its content. While most current works focus on the simple problem of determining whether the sentiment is positive or negative, Multi-Way Sentiment Analysis (MWSA)...
Article
Full-text available
Text classification is a very important topic. Nowadays there are a huge amount of available data. This data need to be classified and categorized. Number of researches applied on Arabic dataset still need more investigation. There are several available methods and techniques to classify data. Also, there are several feature selection methods used...

Citations

... Gabor filter features achieved recognizable performance in handwriting recognition [16] [17] [18] [24] [25], machine-printed text recognition [26], writer identification [27], script identification [28] and handwritten vs. machine-printed text identification [29]. Gabor filters were utilized in implementing early feature learning frameworks inspired by the Hubel and Wiesel model, e.g., the Neocognitron [30], Cresceptron [31] and HMAX [32]. ...
Article
Full-text available
Holistic recognition of isolated words is an essential task in several daily life applications, e.g., bank check processing and postal address reading. In this work we present a system for the automatic recognition of Arabic handwritten words based on statistical features extracted by Bag-of-Features framework that exploits the discriminative power of Gabor features. A handwritten text image is filtered by a set of Gabor filters of different scales and orientations for extracting texture-based local features. The response of the Gabor filters are organized into two layouts, viz. the Statistical Gabor Features and Gabor Descriptors, and fed to the Bag-of-Features producing statistical representations for the handwritten text. The produced features are utilized in a holistic handwritten word recognition system that is applied on handwritten Arabic checks legal amounts public dataset. The effective parameters of the two layouts as well as the Bag-of-Features framework are experimentally evaluated and the optimal values are used in reporting the final recognition accuracies. The best average recognition accuracy achieved by the produced features is 86.44% which is promising in such challenge dataset of large number of classes.
... Benjelil et al., presented in [9] a performance comparison of curvelets, dual-tree complex wavelet and discrete wavelet transform in handwritten words classification (Arabic and Latin). In our previous works [1][2][3][4], we successfully employed different sets of features such as word vertical projection variance, baseline profile, run-length and crossing count histograms, some structural features and histograms of oriented gradients. ...
... Note that rare are works which are proposed for machine-printed/handwritten and Arabic/ Latin script identification at word level. As it can be seen, the use of the different proposed BRL features with k-NN classifier achieves an accuracy of 98.92 % which is slightly better than those of systems proposed in [3,5]. Comparing to system proposed by [8], the accuracy difference is not very important (98.92 % compared to 99.10 %). ...
Article
Full-text available
In this work, we propose a texture-based approach to separate handwritten from machine-printed words, written in Arabic and Latin scripts. The idea is to benefit from differences in writing orientation and the difference between the stroke length to discriminate between these scripts. For that, we designed a K nearest neighbors classifier trained with a set of texture features. These features are extracted from black run-length (BRL) histograms and seem to be suitable for finding structural characteristics in word images. Four feature extraction scenarios: (1) BRL, (2) restricted BRL, (3) BRL statistics and (4) restricted BRL combined to their statistics are chosen to demonstrate the potential of such a texture-based approach in script identification. Exploiting these features, we have got very promising result. The identification correct rate is higher than 98.92 % in our experiments.
... Finally, the conclusions are presented in Section V. II. GABOR FILTERS RESPONSE FEATURES Gabor filters are powerful texture features applied in several vision applications including image segmentation [15], handwriting recognition [9] [10] [11], identifying the script of text images [16] [17] [18] and identifying the nature (handwritten or machine printed) of text in document images [19]. Gabor filter response features are obtained by convolving text images with a set of 2-D Gabor filters of different scales and orientations. ...
Conference Paper
Handwriting recognition is a challenge task due to the large variability in human writings. Improving the feature representations that rely on the visual appearance of the handwritten text would lead to better recognition. In this paper, we integrate two powerful appearance-based features for producing robust statistics for handwritten text. A handwritten text image is filtered by a set of Gabor filters of different scales and orientations for extracting texture-based local features. The Gabor filter response features are organized into two layouts, viz. the Statistical Gabor Features and Gabor Descriptors, and fed to the Bag-of-Features for learning robust statistical representations for the handwritten text. The produced features are utilized in a holistic handwritten word recognition system and evaluated on a public dataset of Arabic handwritten subwords of Arabic checks legal amounts. The best average recognition accuracy achieved by the produced features is 86.44% which is promising in such challenge dataset of large number of classes
... Echi et al. [64] worked for machine-printed and handwritten words of Latin and Arabic scripts. As, from each word, presence of diacritic points, loop positions and elongated descenders as feature vectors are calculated. ...
... Run length histogram describes the character stroke lengths, whereas crossing count histogram describes the complexity of strokes. From Table 7, method by Spitz [49] showed full efficiency, but for machine-printed database, whereas Echi et al. [64] is applicable to both machine-printed and handwritten databases. Length of words does not affect the features calculated in Pati and Ramakrishnan [2]. ...
Article
Full-text available
Script identification is being widely accepted techniques for selection of the particular script OCR (Optical Character Recognition) in multilingual document images. Extensive research has been done in this field, but still it suffers from low identification accuracy. This is due to the presence of faded document images, illuminations and positions while scanning. Noise is also a major obstacle in the script identification process. However, it can only be minimized up to a level, but cannot be removed completely. In this paper, an attempt is made to analyze and classify various script identification schemes for document images. The comparison is also made between these schemes, and discussion is made based upon their merits and demerits on a common platform. This will help the researchers to understand the complexity of the issue and identify possible directions for research in this field.
... The majority of recent works devoted to script identification consider printed documents [30,26,27,31,29,32,23]. Only few recent works handle both printed and handwritten documents [34,35,28,33]. The methods working at the document level are based on shape analysis. ...
... [34] performs a zone classification using a KNN and physical features extracted at both level: the block level (number of occlusions, diacritics . . . ) and the connected component level (density, eccentricity . . . ). [35] performs a feature selection among the features proposed in the literature (projection profile, connected components with/height, steerable pyramid . . . ) and compares different classifiers and achieves best performance with a Bayes classifier. ...
... Both [28,33] approaches are using shape analysis and reach an average accuracy around 95% on a private dataset composed of postal images [28] and on the IAM-DB and the University of Maryland datasets [33]. Two other methods [34,35] perform script identification as well as writing type discrimination. Also based on shape analysis combined with classifiers, these approaches achieve a global rate classification within a range from 88% in [34] to 98.72% in [35]. ...
... The majority of recent works devoted to script identification consider printed documents [30,26,27,31,29,32,23]. Only few recent works handle both printed and handwritten documents [34,35,28,33]. The methods working at the document level are based on shape analysis. ...
... [34] performs a zone classification using a KNN and physical features extracted at both level: the block level (number of occlusions, diacritics . . . ) and the connected component level (density, eccentricity . . . ). [35] performs a feature selection among the features proposed in the literature (projection profile, connected components with/height, steerable pyramid . . . ) and compares different classifiers and achieves best performance with a Bayes classifier. ...
... Both [28,33] approaches are using shape analysis and reach an average accuracy around 95% on a private dataset composed of postal images [28] and on the IAM-DB and the University of Maryland datasets [33]. Two other methods [34,35] perform script identification as well as writing type discrimination. Also based on shape analysis combined with classifiers, these approaches achieve a global rate classification within a range from 88% in [34] to 98.72% in [35]. ...
Article
Full-text available
This paper presents a system dedicated to automatic language identification of text regions in heterogeneous and complex documents. This system is able to process documents with mixed printed and handwritten text and various layouts. To handle such a problem, we propose a system that performs the following sub-tasks: writing type identification (printed/handwritten), script identification and language identification. The methods for the writing type recognition and the script discrimination are based on the analysis of the connected components while the language identification approach relies on a statistical text analysis , which requires a recognition engine. We evaluate the system on a new public dataset and present detailed results on the three tasks. Our system outperforms the Google plug-in evaluated on the ground-truth transcriptions of the same dataset.
... We then explore the use of HOG-based shape descriptors for Arabic/Latin and handwritten/machine-printed at word level. A/L P/H (400B) B 88.5% [38] A/L P/H (400B) B 95% [6] A/L P/H (800D) B 84.75% [11] A/L P/H (600D) D 82%(A), 92%(L) [5] A/L P (1976T, 8320W) T/W 99.7%(T), 96.8%(W) [7] A/L P/H (800W) W 97.5% [8] A/L P (4229W) W 94.32% [23] A/L P/H (1720W) W 98.72% [15] A/L H (1068W) W 100%(L), 98.88%(A) [28] A Zheng et al. [4] proposed features from Run-length histogram for machine-printed/handwritten Chinese character classification. Black pixel run-lengths are extracted in three directions: horizontal, vertical and diagonal. ...
... The highest rate of correct identification is obtained with gray-level co-occurrence matrices. In previous work [23], we tested all of the above script identification techniques to discriminate between Arabic/Latin and machineprinted/handwritten word. We present the obtained results for each technique on a database of 1720 words. ...
Article
Full-text available
In this paper, we present an approach for Arabic and Latin script and its type identification based onHistogram of Oriented Gradients (HOG) descriptors. HOGs are first applied at word level based on writingorientation analysis. Then, they are extended to word image partitions to capture fine and discriminativedetails. Pyramid HOG are also used to study their effects on different observation levels of the image.Finally, co-occurrence matrices of HOG are performed to consider spatial information between pairs ofpixels which is not taken into account in basic HOG. A genetic algorithm is applied to select the potentialinformative features combinations which maximizes the classification accuracy. The output is a relativelyshort descriptor that provides an effective input to a Bayes-based classifier. Experimental results on a set ofwords, extracted from standard databases, show that our identification system is robust and provides goodword script and type identification: 99.07% of words are correctly classified.
... Recently, Echi et al. [7] proposed a script identification technique for both printed and handwritten text for Arabic-Latin documents. They used many well-known features in addition to their own features to identify Arabic and English. ...
Conference Paper
Full-text available
In this paper, we present a novel methodology for multiple script identification using Long Short-Term Memory (LSTM) networks’ sequence-learning capabilities. Our method is able to identify multiple scripts at text-line level, where two or more scripts are present in the same text-line. Unlike traditional techniques, where either shape features are extracted or bounding boxes of individual characters are extracted, the LSTM-based system learns a particular script in a supervised learning framework. Moreover,needs neither specific features nor other preprocessing step other than text-line extraction and text-line normalization. The proposed method works on text- line level, where it identify each character as belonging to a particular script. We have developed a database consisting of English and Greek script, and our system was able to achieve a script recognition accuracy of 98.186% on this dataset.
... In [10], Benjelil and al. present a performance comparison of curvelets, dual-tree complex wavelet and discrete wavelet transform in handwritten words classification (Arabic and Latin). In former works [1], [3], we successfully employed different sets of features such as word vertical projection variance, baseline profile, run-length and crossing count histograms, bottom diacritics, loop position and elongate descenders. ...
Conference Paper
Full-text available
In this paper, we propose a new scheme for script and nature identification. The objective is to discriminate between machine-printed/handwritten and Latin/Arabic scripts at word level. It is relatively a complex task due to possible use of multi-fonts and sizes, complexity and variation in handwriting. In the proposed script identification system, we extract features from word images using Co-occurrence Matrix of Oriented Gradients (Co-MOG). The classification is done using k Nearest Neighbors (k-NN) classifier. Extensive experimentation has been carried on 24000 words extracted from standard databases. An average identification accuracy of 99.85% is achieved which clearly outperforms results of some existing systems.
... In former works [25] and [30], we successfully employed different sets of features to identify machine-printed and handwritten words in Arabic and Latin scripts. They include features, previously used in the literature (such as word vertical projection variance, baseline profile, pixels distribution, runlength and crossing count histograms, the width, height, aspect ratio, area, density and profiles analysis of connected components, separator length between two successive connected components, etc.) and some new structural features (bottom diacritics, loop position and elongate descenders) which are intrinsic features of Arabic and Latin scripts. ...
Conference Paper
Full-text available
This paper considers the problem of script and nature identification at word level. We introduce Pyramid Histogram of Oriented Gradients (PHOG) features which have been employed successfully for discriminating between handwritten and machine-printed Arabic and Latin scripts. Most of the image features, used in previous identification system, are not effective to capture differences between these scripts especially due to their cursive nature. The proposed shape descriptor, PHOG features, counts occurrences of gradient orientation in localized portion of an image. It has been proved as an efficient tool for providing spatial distribution of pixels. A genetic algorithm is applied to improve the performance and generalization of the PHOG features. Experiments have been conducted using standard databases. An average identification rate of 98.3 percent was achieved using Bayes based classifier, which is clearly better than those reported in similar works.