Table 1 - uploaded by Jawad H. Alkhateeb
Content may be subject to copyright.
Arabic Letter shapes

Arabic Letter shapes

Source publication
Article
Full-text available
In this paper, we propose and describe efficient multiclass classification and recognition of unconstrained handwritten Arabic words using machine learning approaches which include the K-nearest neighbor (K-NN) clustering, and the neural network (NN). The technical details are presented in terms of three stages, namely preprocessing, feature extrac...

Context in source publication

Context 1
... the difference here is that each of the 28 letters within the Arabic alphabet has either two or four shapes depending on its position in the text written, and the whole text is written from right to left in a cursive way. Each letter may have up to four various shapes according to its position in the word within the text, i.e. at the start, in the middle, at the end or alone [1] as shown in Table 1. For example letter Ayn () has the following shapes: start , middle , end , and alone . ...

Similar publications

Article
Full-text available
In this paper we characterize Arabic and Latin ancient document images. The main criticism of existing works is that most of them are interested in the characterization of Latin historical documents, and they are up to now no many methods that can perform the discrimination between these different language old document images. Regions of images hav...
Article
Full-text available
A new method to recognise words in Arabic handwritten manuscript is presented. The method injects the spectral features extracted from an input word image to a group of previously trained word models. Each word model is a single hidden Markov model. The likelihood probability of the input pattern is calculated against each model and the pattern is...
Article
Full-text available
Nowadays, numerous corporations (such as Google, Baidu, etc.) require an efficient and effective search algorithm to crawl out the images with queried objects from databases. Moreover, privacy protection is a significant issue such that confidential images must be encrypted in corporations. Nevertheless, decrypting and then classifying millions of...
Article
Full-text available
This study aims to review the latest contributions in Arabic Optical Character Recognition (OCR) during the last decade, which helps interested researchers know the existing techniques and extend or adapt them accordingly. The study describes the characteristics of the Arabic language, different types of OCR systems, different stages of the Arabic...

Citations

... In Alamri et al. [6], the direction of the gradient was used as characteristics of the character image. The invariant moments were used as a characteristic by Amara and Zidi [8] and Al-Khateeb et al. [9]. Chergui et al. [10] have proposed a system that uses features invariant to scale for the recognition of Arabic handwritten words. ...
Chapter
Knowledge concerning the topography of Arabic letters, as well as the structural characteristics between background regions and character components is investigated as a novel approach for Arabic recognition. The suggested feature extraction method reduces the classifier input data to only the most significant and essential.
... Typical features applied to handwriting recognition include statistical features [16], structural features [17] and global transformations [24]. For classification, state-of-theart classifiers including Artificial Neural Networks (ANN) [6], Support Vector Machine (SVM) [3,22], and Hidden Markov Models (HMM) [8,15] have been extensively applied to Arabic handwriting/word recognition. In some cases, classifiers are combined to improve the overall recognition rates [1,5,11,28]. ...
Article
Full-text available
This study investigates the combination of different classifiers to improve Arabic handwritten word recognition. Features based on Discrete Cosine Transform (DCT) and Histogram of Oriented Gradients (HOG) are computed to represent the handwritten words. The dimensionality of the HOG features is reduced by applying Principal Component Analysis (PCA). Each set of features is separately fed to two different classifiers, Support Vector Machine (SVM) and Fuzzy K-Nearest Neighbor (FKNN) giving a total of four independent classifiers. A set of different fusion rules is applied to combine the output of the classifiers. The proposed scheme evaluated on the IFN/ENIT database of Arabic handwritten words reveal that combining the classifiers results in improved recognition rates which, in some cases, outperform the state-of-the-art recognition systems.
... Typical features applied to handwriting recognition include statistical features [16], structural features [17] and global transformations [24]. For classification, state-of-theart classifiers including Artificial Neural Networks (ANN) [6], Support Vector Machine (SVM) [3,22], and Hidden Markov Models (HMM) [8,15] have been extensively applied to Arabic handwriting/word recognition. In some cases, classifiers are combined to improve the overall recognition rates [1,5,11,28]. ...
Article
Full-text available
This study investigates the combination of different classifiers to improve Arabic handwritten word recognition. Features based on Discrete Cosine Transform (DCT) and Histogram of Oriented Gradients (HOG) are computed to represent the handwritten words. The dimensionality of the HOG features is reduced by applying Principal Component Analysis (PCA). Each set of features is separately fed to two different classifiers, Support Vector Machine (SVM) and Fuzzy K-Nearest Neighbor (FKNN) giving a total of four independent classifiers. A set of different fusion rules is applied to combine the output of the classifiers. The proposed scheme evaluated on the IFN/ENIT database of Arabic handwritten words reveal that combining the classifiers results in improved recognition rates which, in some cases, outperform the state-of-the-art recognition systems.
... Various kinds of features can be found and/or calculated for an object in a PR system. Usually, features are categorized into global transformations [19], structural [20], statistical [21], and template-based matching [22]. ...
Article
Full-text available
Dimensionality reduction (feature selection) is an important step in pattern recognition systems. Although there are different conventional approaches for feature selection, such as Principal Component Analysis, Random Projection, and Linear Discriminant Analysis, selecting optimal, effective, and robust features is usually a difficult task. In this paper, a new two-stage approach for dimensionality reduction is proposed. This method is based on one-dimensional and two-dimensional spectrum diagrams of standard deviation and minimum to maximum distributions for initial feature vector elements. The proposed algorithm is validated in an OCR application, by using two big standard benchmark handwritten OCR datasets, MNIST and Hoda. In the beginning, a 133-element feature vector was selected from the most used features, proposed in the literature. Finally, the size of initial feature vector was reduced from 100% to 59.40% (79 elements) for the MNIST dataset, and to 43.61% (58 elements) for the Hoda dataset, in order. Meanwhile, the accuracies of OCR systems are enhanced 2.95% for the MNIST dataset, and 4.71% for the Hoda dataset. The achieved results show an improvement in the precision of the system in comparison to the rival approaches, Principal Component Analysis and Random Projection. The proposed technique can also be useful for generating decision rules in a pattern recognition system using rule-based classifiers.
... There are many works reported on the recognition of Arabic and Farsi texts [1,[10][11][12][13][14][15][16][17][18][19][20][21][22][23]. There are few works based on holistic recognition of Farsi/Arabic subwords by their shape information. ...
... In [21], five various methods have been used to extract the Arabic word image features. In one method, two-dimensional discrete wavelet transform (DWT) is used to extract the features, as it is well acknowledged that DWT coefficients can provide a powerful insight into an image's frequency and spatial characteristics. ...
Article
Full-text available
In this paper, we present a new approach to offline OCR (optical character recognition) for printed Persian subwords using wavelet packet transform. The proposed algorithm is used to extract font invariant and size invariant features from 87804 subwords of 4 fonts and 3 sizes. The feature vectors are compressed using PCA. The obtained feature vectors yield a pictorial dictionary for which an entry is the mean of each group that consists of the same subword with 4 fonts in 3 sizes. The sets of these features are congregated by combining them with the dot features for the recognition of printed Persian subwords. To evaluate the feature extraction results, this algorithm was tested on a set of 2000 subwords in printed Persian text documents. An encouraging recognition rate of 97.9% is got at subword level recognition.
... However, in our case the dataset size is much larger and needs to be preprocessed. AIFN/ENIT dataset is used and good results are obtained by[22]. However our proposed framework also out performs this framework. ...
Conference Paper
Full-text available
In recent years, rapidly developed hand written word recognition techniques have attracted researcher's attention to study Arabic word classification. Arabic language has cursive style of writing so it needs special framework for classification. In this paper, a precise framework for Arabic word classification is presented, which uses sparse coding with spatial pyramid matching (SPM) algorithm and linear support vector machine classifier. SPM maps each feature set to a multi-resolution histogram that preserves the individual feature at the finest level. The histogram pyramids are then compared by using a weighted histogram intersection algorithm. Our proposed framework is evaluated with four publically available datasets; IFN/ENIT, PATS-A01, IFHCDB and ISI Bangla numeral. Experimental results show that the proposed framework outperforms those state of art methods used for Arabic words classification. Keywords-Arabic Character Recognition; linear support vector machine (LSVM); Spatial Pyramid Matching (SPM); Scale invariant feature transform (SIFT).
... Using Moment Invariants (MI) is very useful in such this case because, it considers the shape descriptors in computer vision [2]. It was effectively applied in many patterns recognition fields including hand writing recognition [3]. Moreover, the affine invariant functions of Flusser [4] and Suk and Flusser [5] are among the better descriptor. ...
Article
Moment Invariant (MI) has been frequently used as feature for shape recognition. These features are invariant to several deformations such as rotation, scaling and translation. However it is sensitive to distortions that primarily affect the 'centre of gravity' of the image. Images of an Arabic Word might have different centroid due to the fact that it might be written using different Handwriting styles. In this paper we examine the effect of replacing the image centroid with the center of image as the reference point in Moment Invariant (MI). The new descriptors set was tested to recognize Arabic Words based on IFN/ENIT Database that consisting of 26459 words written by 411 different writers. The Back Propagation Neural Network was used as the classifier. Experiment results had shown that by using the new descriptors the average recognition accuracy has increased by 18.38%.
... The invariant Hu moments are computed upon the sequences Sc and Scc of the segmented letters. The moments are calculated as in [20] . An overview of various moments offers [21]. ...
Article
Full-text available
The world heritage of handwritten Arabic documents is huge however only manual indexing and retrieval techniques of the content of these documents are available. To facilitate an automatic retrieval of such hand-written Arabic document, a number of automatic recognition systems for handwritten Arabic words have been proposed. Nevertheless, these systems suffer from low recognition accuracy due to the peculiarities of the handwritten Arabic language. Thus, in this Paper we propose a segmentation based recognition system for handwritten Arabic words. We divide a handwritten word into smaller pieces of a word and then these small pieces are segmented into candidate letters. These candidate letters are converted into their correspondence chain-code representation. Thereafter we extract discrete, statistical and structural features for classifica-tion. Additionally, we introduce a novel active contour based feature to increase the recognition accuracy of strongly deformed Arabic letters. We also use a decision tree to reduce the number of potential classes. We then use a neural network to compute weights for all statistical features and use them as input for a k-NN classifier. Our experiments show that the extracted features by our technique achieve higher recognition accuracy as compared to other features.
... The invariant Hu moments are computed upon the sequences Sc and Scc of the segmented letters. The moments are calculated as in [20] . An overview of various moments offers [21]. ...
Article
Full-text available
A precise and efficient segmentation for handwritten Arabic text is a vital prerequisite for the accuracy of the subsequent recognition phase. In this paper, we present a dualphase segmentation approach. The proposed approach starts first by detecting and resolving sub-words overlapping, then a topological features based segmentation is applied by means of a set of heuristic rules. Because of its crucial importance, the segmentation phase is preceded by a handwritten specific preprocessing phase, that considers issues like word's skew- and slant- correction. The proposed approach has been successfully tested on a database of handwritten Arabic words, that contains more than 3000 words images. The results were very promising and indicating the efficiency of our approach.