Conference PaperPDF Available

Study of Different Features on Handwritten Devnagari Character

Authors:

Abstract

In this paper a scheme for offline handwritten Devnagari character recognition is proposed, which uses different feature extraction and recognition algorithms. The proposed system assumes no constraints in writing style, size or variations. First the character is preprocessed and features namely : chain code histogram, four side views, shadow based are extracted and fed to multilayer perceptrons as a preliminary recognition step. Finally the results of all MLP's are combined using weighted majority scheme. The proposed system is tested on 1500 handwritten devnagari character database collected from different people. It is observed that the proposed system achieves 98.16% recognition rates as top 5 results and 89.58% as top 1 results.
AbstractIn this paper a scheme for offline Handwritten
Devnagari Character Recognition is proposed, which uses
different feature extraction and recognition algorithms. The
proposed system assumes no constraints in writing style, size
or variations. First the character is preprocessed and features
namely : Chain code histogram , four side views , shadow
based are extracted and fed to Multilayer Perceptrons as a
preliminary recognition step. Finally the results of all MLP’s
are combined using weighted majority scheme. The proposed
system is tested on 1500 handwritten devnagari character
database collected from different people. It is observed that
the proposed system achieves 98.16% recognition rates as top
5 results and 89.58% as top 1 results.
Keywords:- Classification, Multilayer Perceptron, Feature
Extraction, Weighted majority Scheme
I. INTRODUCTION
Although first research report on handwritten Devnagari
characters was published in 1977 [4] but not much research
work is done after that. At present researchers have started
to work on handwritten Devnagari characters and few
research reports are published recently. Hanmandlu and
Murthy [5, 14] proposed a Fuzzy model based recognition
of handwritten Hindi numerals and characters and they
obtained 92.67% accuracy for Handwritten Devnagari
numerals and 90.65% accuracy for Handwritten Devnagari
characters. Bajaj et al [6] employed three different kinds of
features namely, density features, moment features and
descriptive component features for classification of
Devnagari Numerals. They proposed multi-classifier
connectionist architecture for increasing the recognition
reliability and they obtained 89.6% accuracy for
handwritten Devnagari numerals. Kumar and Singh [7]
proposed a Zernike moment feature based approach for
Devnagari handwritten character recognition. They used an
artificial neural network for classification. Sethi and
Chatterjee [8] proposed a decision tree based approach for
recognition of constrained hand printed Devnagari
characters using primitive features. Bhattacharya et al [9]
proposed a Multi-Layer Perceptron (MLP) neural network
based classification approach for the recognition of
Devnagari handwritten numerals and obtained 91.28%
results. N. Sharma and U. Pal [1] proposed a directional
chain code features based quadratic classifier and obtained
80.36% accuracy for handwritten Devnagari characters and
98.86% accuracy for handwritten Devnagari numerals. In
most of the works reported above, multiple classifier
combination has not been reported for handwritten
Devnagari characters. Most of them are based on single
classifier or reported for handwritten Devnagari numerals.
In this paper we are presenting the results of various feature
extraction techniques experimented on handwritten
Devnagari characters. Different features are experimented
individually using MLP classifiers and their combined
results are also experimented. The results of all MLP’s are
combined using weighted majority scheme.
Our feature set is obtained from chain code histogram,
shadow and view based. Chain codes histogram features are
extracted from scaled contour of the image. Shadow
features are extracted from scaled image and view based
features are extracted from scaled and thinned character
image. These features are then fed to the Multi layer
Perceptron for recognition.
Rest of the paper is organized as follows. In section 2,
peculiarities of Devnagari Script are discussed. Feature
extraction techniques are reported in section 3. Section 4,
deals with the classifiers used for the recognition purpose.
The experimental results are discussed in section 5.
II. PECULIARITIES OF DEVNAGARI SCRIPT
Devnagari script is different from Roman script in several
ways. This script has two-dimensional compositions of
symbols: core characters in the middle strip, optional
modifiers above and/or below core characters. Two
characters may be in shadow of each other. While line
segments (strokes) are the predominant features for
English, most of the characters in Devnagari script is
formed by curves, holes, and also strokes. In Devnagari
language scripts, the concept of upper-case, the lower-case
characters is absent. However the alphabet itself contains
more number of symbols than that of English.
Devnagari script have around 14 vowels and 33 consonants
resulting in a total of 47 or even more basic characters.
Vowels occur either in isolation or in combination with
consonants. Apart from vowels and consonants characters
called basic characters, there are compound characters in
Devnagari script alphabet system, which are formed by
combining two or more basic characters. The shape of
compound character is usually more complex than the
Study of Different Features on Handwritten Devnagari Character
S. Arora1, D. Bhattacharjee2, M. Nasipuri2 , D.K. Basu2 , M.Kundu2 , L.Malik3
1Meghnad Saha Institute of Technology, Kolkata-107, India
Email: sandhyabhagat@yahoo.com
2Department of Computer Science and Engg, Jadavpur University, Kolkata ,India
3G.H. Raisoni college of Engineering, Nagpur, India
Second International Conference on Emerging Trends in Engineering and Technology, ICETET-09
978-0-7695-3884-6/09 $26.00 © 2009 IEEE 929
constituent basic characters. Coupled to this in Devnagari
script there is a practice of having more than twelve forms
each for 33 consonants , giving rise to modified shapes
which, depending on whether the vowel is placed to the
left, right, top or bottom of the character. They are called
modified characters. The net result is that there are several
thousand different shapes or patterns, which may, in
addition be connected with each other without any visible
separation. This makes Devnagari OCR more difficult to
develop.
(a)
(b)
( c )
Figure 1: Sample of Handwritten Devnagari a) vowel b) consonants c)
compound characters
III. FEATURE EXTRACTION
In the following we give a brief description of the feature
sets used in our proposed system. Chain code histogram
features are extracted by chain coding the contour points of
the scaled character bitmapped image. View based features
are extracted from scaled, thinned one pixel wide skeleton
of character image. Shadow features are extracted from
scaled character image.
A. Shadow Features of character
For computing shadow features [13], the rectangular
boundary enclosing the character image is divided into
eight octants, for each octant shadow of character segment
is computed on two perpendicular sides so a total of 24
shadow features are obtained. Shadow is basically the
length of the projection on the sides as shown in figure 2.
These features are computed on scaled image.
Figure 2. Shadow features
B. Chain Code Histogram of Character Contour
Given a scaled binary image, we first find the contour
points of the character image. We consider a 3 × 3 window
surrounded by the object points of the image. If any of the
4-connected neighbor points is a background point then the
object point (P), as shown in figure 3 is considered as
contour point.
Figure 3. Contour point detection
The contour following procedure uses a contour
representation called “chain coding” that is used for contour
following proposed by Freeman [15], shown in figure 4a.
Each pixel of the contour is assigned a different code that
indicates the direction of the next pixel that belongs to the
contour in some given direction. Chain code provides the
points in relative position to one another, independent of
the coordinate system. In this methodology of using a chain
coding of connecting neighboring contour pixels, the points
and the outline coding are captured. Contour following
procedure may proceed in clockwise or in counter
clockwise direction. Here, we have chosen to proceed in a
clockwise direction.
X
X P X
X
930
(a) (b) (c)
Figure 4. Chain Coding: (a) direction of connectivity, (b) 4-connectivity,
(c) 8-connectivity. Generate the chain code by detecting the direction of
the next-in-line pixel
The chain code for the character contour will yield a
smooth, unbroken curve as it grows along the perimeter of
the character and completely encompasses the character.
When there is multiple connectivity in the character, then
there can be multiple chain codes to represent the contour
of the character. We chose to move with minimum chain
code number first.
We divide the contour image in 5 × 5 blocks. In each of
these blocks, the frequency of the direction code is
computed and a histogram of chain code is prepared for
each block. Thus for 5 × 5 blocks we get 5 × 5 × 8 = 200
features for recognition.
C. View based features
This method is based on the fact, that for correct character-
recognition a human usually needs only partial information
about it – its shape and contour. This feature extraction
method examines four “views” of each character extracting
from them a characteristic vector, which describes the
given character. The view is a set of points that plot one of
four projections of the object (top, bottom, left and right) –
it consists of pixels belonging to the contour of the
character and having extreme values of one of its
coordinates. For example, the top view of a letter is a set of
points having maximal y coordinate for a given x
coordinate. Next, characteristic points are marked out on
the surface of each view to describe the shape of that view
(Figure.5) The method of selecting these points and their
number may vary from letter to another. In the considered
examples, eleven uniformly distributed characteristic points
are taken for each view.
Figure 5. Selecting characteristic points for four views
The next step is calculating the y coordinates for the points
on the top and down views, and x coordinates for the points
on left and right views. These quantities are normalized so
that their values are in the range <0, 1>. Now, from 44
obtained values the characteristic vector is created to
describe the given character, and which is the base for
further analysis and classification.
IV. CHARACTER RECOGNITION
We used different MLP with 3 layers including one hidden
layer for two different feature sets consisting of 200 chain
code histogram features 24 shadow features and 44 view
based features. The experimental results obtained while
using these features for recognition of handwritten
Devnagari characters is presented in section 5. At this stage
all characters are non-compound, single characters so no
segmentation is required.
Each MLP is trained with Backpropagation learning
algorithm with momentum [9]. It minimizes the sum of
squared errors for the training samples by conducting a
gradient descent search in the weight space. As activation
function we used sigmoid function. Learning rate and
momentum term are set to 0.8 and 0.7 respectively. As
activation function we used the sigmoid function. Numbers
of neurons in input layer of MLPs are 200, 24 and 44 for
chain code histogram, shadow and view based features
respectively. Number of neurons in Hidden layer is not
fixed, we experimented on the values between 20-50 to get
optimal result and finally it was set to 50, 30 and 40 for
chain code histogram, shadow and view based features
respectively. The output layer contained one node for each
class, so the number of neurons in output layer is 20.
A. Classifier Combination
The ultimate goal of designing pattern recognition system
is to achieve the best possible classification performance.
This objective traditionally led to the development of
different classification scheme for any pattern recognition
problem to be solved. The result of an experimental
assessment to the different design would then be the basis
for choosing one of the classifiers as the final solution to
the problem. It had been observed in such design studies,
that although one of the designs would yield the best
performance, the sets of patterns misclassified by the
different classifiers would not necessarily overlap. This
suggested that different classifier designs potentially
offered complementary information about the pattern to be
classified which could be harnessed to improve the
performance of the selected classifier. So instead of relying
on a single decision making scheme we can combine
classifiers.
We have two Neural networks classifiers as discussed
above, which are trained on 200 chain code, 24 shadow and
44 view based features respectively. The outputs are
confidences associated with each class. As these outputs
cannot be compared directly, we used an aggregation
function for combining the results of all three classifiers.
1
0
7
6
2
3
4
5
931
Our strategy is based on weighted majority voting scheme
as described below.
So if kth classifier decision to assign the unknown pattern to
the ith class is denoted by Oik with 1 i m, m being the
number of classes, then the final combined decision di
cm
supporting assignment to the ith class takes the form of :-
di
com = ωk * Oik …….1 i m
k=1,2,3
The final decision dcom is therefore :-
dcom = max di
com
1 i m
dk
ωk = ------- ------------
3
dk
k=1
where m = 20 and ω1, ω2 and ω3 are 0.384 ,0.354 and
0.262 respectively as d1> d2 > d3
d1=88.19% result of classifier trained with chaincode
histogram features
d2=81.25% result of classifier trained with shadow features
d3=60.07% result of classifier trained with view based
features
V. RESULTS
The experiment evaluation of the above technique was
carried out using isolated devnagari characters collected
different people. A total of 1500 samples of Devnagari
basic characters (vowels as well as consonants) are used for
our experiment out of which 65% characters are used for
the training and rest is used for testing purpose. The
recognition accuracy obtained from our above discussed
classifiers separately are shown in table I. Three MLP’s are
designed for features namely Chain code Histogram based,
four side views based and Shadow based features. Results
of three MLP’s are combined using weighted majority
scheme discussed above. Combined MLP is giving 98.61%
accuracy as we considered top 5 choices results.
We applied 3-fold cross validation testing. We divided the
whole dataset into three parts. In first fold, first two parts
are used for training and third part is used for testing. In
second fold, first and third part is used for training and
second part is used for testing. In fold three, second and
third part is used for training and first part is used for
testing. The average error across all three trials is
computed. The advantage of this method is that it matters
less how the data gets divided. Every data point gets to be
in test set exactly once, and gets to be in training set
remaining times. We compared our current results with
those existing pieces of work. Details comparative results
are given in table III.
Table I. Results of three different MLP
Table II. Top Choices Results
Table III: Comparison of Results
S.
No.
Method purposed by Accuracy
1. Kumar and Singh [7] 80%
2. N. Sharma, U. Pal, F. Kimura, and S. Pal
[1]
80.36%
3. M. Hanmandlu, O.V. R. Murthy, V.K.
Madasu [14]
90.65%
5. Proposed method 98.61%
VI. CONCLUSION
India is a multi-lingual and multi-script country comprising
of eleven different scripts. Devnagari is third most widely
used script, used for several major languages such as Hindi,
Sanskrit, Marathi and Nepali, and is used by more than 500
million people. But not much work has been done towards
off-line handwriting recognition of Devnagari script. In this
paper we present a technique of recognition of offline
handwritten Devnagari characters using MLP In future we
plan to experiment on other feature extraction methods to
get higher recognition accuracy from our system.
ACKNOWLEDGMENT
Authors are thankful to the “Centre for Microprocessor
Application for Training Education and Research” and
“Project on Storage Retrieval and Understanding of Video
for Multimedia”, at the Department of Computer Science
MLP Input layer
Neuron
Hidden La yer
Neuron
Output La yer
Neuron
Result
Chain Code
Histogram
Feature
based
200 50 20 88.19%
Shadow
Features
based
32 15 20 81.25%
View based
Feature
based
44 30
20 60.07%
S.
No.
Proposed method
result
Accuracy
obtained
1 Top 1 choice 89.58%
2 Top 2 choices 94.79%
3 Top 3 choices 97.57%
4 Top 4 choices 98.26%
5 Top 5 choices 98.61%
932
and Engineering, Jadavpur University, Kolkata-700032 for
providing the necessary facilities for carrying out this work.
First author gratefully acknowledge the support of the
Meghnad Saha Institute of Technology for carrying out this
research work.
REFERENCES
[1] N. Sharma, U. Pal, F. Kimura, and S. Pal, ” Recognition of Off-Line
Handwritten Devnagari Characters Using Quadratic Classifier”,
ICVGIP 2006, LNCS 4338, pp. 805 – 816, 2006.
[2] U. Pal and B.B. Chaudhuri, “Indian script character recognition: A
Survey”, Pattern Recognition, Vol. 37, pp. 1887-1899, 2004.
[3] B. B. Chaudhuri and U. Pal, “A complete printed Bangla OCR
system”, Pattern Recognition, vol. 31, pp. 531-549, 1998.
[4] I.K. Sethi and B. Chatterjee, “Machine Recognition of constrained
Hand printed Devnagari”, Pattern Recognition, Vol. 9, pp. 69-75,
1977.
[5] M. Hanmandlu and O.V. Ramana Murthy, “Fuzzy Model Based
Recognition of Handwritten Hindi Numerals”, Intl.Conf. on
Cognition and Recognition, pp. 490-496, 2005.
[6] Reena Bajaj, Lipika Dey, and S. Chaudhury, “Devnagari numeral
recognition by combining decision of multiple connectionist
classifiers”, Sadhana, Vol.27, part. 1, pp.-59-72, 2002
[7] Satish Kumar and Chandan Singh, “A Study of Zernike Moments and
its use in Devnagari Handwritten Character Recognition”, Intl.Conf.
on Cognition and Recognition, pp. 514- 520, 2005.
[8] I.K. Sethi and B. Chatterjee, “Machine Recognition of constrained
Hand printed Devnagari”, Pattern Recognition, Vol. 9, pp. 69-75,
1977.
[9] U. Bhattacharya, B. B. Chaudhuri, R. Ghosh and M. Ghosh, “On
Recognition of Handwritten Devnagari Numerals”, In Proc. of the
Workshop on Learning Algorithms for Pattern Recognition (in
conjunction with the 18th Australian Joint Conference on Artificial
Intelligence), Sydney, pp.1-7, 2005.
[10] S. Arora, D. Bhattacharjee, M. Nasipuri, L. Malik, “Classification Of
Gradient Change Features Using MLP for Handwritten Character
Recognition”, Emerging Applications of Information Technology
(EAIT), Kolkata, India, 2006
[11] S. Arora, D. Bhattacharjee, M. Nasipuri, L. Malik,A Novel
Approach for Handwritten Devnagari Character Recognition”,
International Conference on Signal and Image Processing (ICSIP),
Hubli, Karnataka, India, 2006
[12] S. Arora, D. Bhattacharjee, M. Nasipuri, L. Malik, “A Two Stage
Classification Approach for Handwritten Devanagari Characters”,
International Conference on Computational Intelligence and
Multimedia Application(ICCIMA07), Sivkasi, Tamil Nadu, India
2007
[13] S. Basu, N.Das, R. Sarkar, M. Kundu, M. Nasipuri, D.K. Basu,
“Handwritten Bangla alphabet recognition using MLP based
classifier”, NCCPB, Bangladesh, 2005
[14] M. Hanmandlu, O.V. Ramana Murthy, Vamsi Krishna Madasu,
“Fuzzy Model based recognition of Handwritten Hindi characters”,
IEEE Computer society, Digital Image Computing Techniques and
Applications , 2007
[15] Freeman, H., On the Encoding of Arbitrary Geometric
Configurations, IRE Trans. on Electr. Comp. or TC(10), No. 2,
June, 1961, pp. 260-268.
933
... This was followed by attempting recognition of handwritten numerals in Devanagari [6]. As the work progressed to handwritten character recognition in Devanagari, attempts were made to find the features that could represent the characters efficiently [7][8][9]. Extensive research is carried out in finding various features and the classifier combinations to improve the recognition rate [10]. Several researchers also applied multiple features in order to improve the accuracy of the recognition engine. ...
... Various statistical parameters are calculated to check the invariance. The similarity is checked by obtaining the cross correlation of the features extracted as indicated in (8). The correlation coefficient 'corr_coeff' generates a maximum value if the features of the database image and rotated image match. ...
... The 'corr_coeff' varies between +1 to -1, where +1 indicates exact match and -1 indicates complete mismatch between the two images. The 'corr_coeff' is obtained by, (8) Where, f and g are the mean of the database character features and the rotated character features respectively. ...
... precision. A fluffy model-based acknowledgment approach has announced by M. Hanmandlu [22]. The highlights are removed by the case approach which separated the character into 24 cells ( [17] in 1976. ...
... Levene's test, t-test and ANOVA (Analysis of Variance) statistical tools provides parametric and inferential significance of the extracted features by analyzing means and variances for different DR lesions. Design of experiment application of SPSS package is utilized for statistical evaluation of the extracted features [22]. ...
Chapter
Full-text available
Natural language processing is a branch of computer science and artificial intelligence which is concerned with interaction between computers and human languages. Natural language processing is the study of mathematical and computational modelling of various aspects of language and the development of a wide range of systems. These include the spoken language systems that integrate speech and natural language. Natural language processing has a role in computer science because many aspects of the field deal with linguistic features of computation. Natural language processing is an area of research and application that explores how computers can be used to understand and manipulates natural language text or speech to do useful things. The applications of Natural language processing include fields of study, such as machine translation, natural language text processing and summarization, user interfaces, multilingual and cross language information retrieval (CLIR), speech recognition, artificial intelligence (AI) and expert systems.
... Different feature extraction techniques implemented for HDCR, quadratic based classifier on the histogram of directional chain code features [16] achieved 98.86% accuracy with the dataset size of 11,270 handwritten Devanagari Characters. The most commonly used classifiers are MLP based classifiers [17][18][19], SVM [18,20,21] Neural Networks [22,23] or Modified Neural Networks for feature-based classification. ...
Article
Full-text available
Despite many advances, Handwritten Devanagari Character Recognition (HDCR) remains unsolved due to the presence of complex characters. For HDCR, the traditional feature extraction and classification techniques are limited to the datasets developed in the respective laboratory that are not available publicly. A standard benchmarking dataset is not available for HDCR that helps to develop deep learning models. To progress the performance of HDCR, in this study, we produced a dataset of 38,750 images of Devanagari numerals, and vowels are generated and made publicly available for fellow researchers in this domain. This data is collected from more than 3000 subjects of different age groups. Each character is extracted by a segmentation technique proposed here, which is limited to this application. Experiments are conducted on the dataset; three different Convolution Neural Networks (CNN) architecture is developed. 1. CNN, 2. Modified Lenet CNN (MLCNN) and 3. Alexnet CNN (ACNN). A Modified LCNN is proposed by changing the architecture of Lenet 5 CNN. Regular Lenet 5 has \(\mathrm{tanh}(x)\) as its activation function. Since the Devangari characters are nonlinear, non-linearity is introduced in the Networks by using Rectified Linear Unit. This solves the problem of vanishing gradient problem by \(\mathrm{tanh}(x)\). We achieved a recognition rate of 96% on training data and 94% on unseen data using CNN. MLCNN obtained an accuracy rate of 99% and 94% with less computational cost. Whereas, ACNN attained a recognition rate of 99% and 98% on unseen data. A series of experiments were conducted on the data with different combination splits of data and found a minimum loss of 0.001%. Such developments fill a significant percentage of the huge gap between real-world requirements and the actual performance of Devanagari recognizers.
... This methodology has been tried on 50,000 examples and gotten 89.12% exactness. In [21], S. Arora consolidated various highlights, for example, chain codes, four side perspectives, and shadow based highlights. These highlights were taken care of into a multilayer perceptron neural organization to perceive 1500 manually written Devanagari characters and get 89.58% ...
Article
Manually written character acknowledgment is as of now getting the consideration of scientists in view of potential applications in helping innovation for dazzle and outwardly hindered clients, human–robot collaboration, programmed information passage for business reports, and so on. In this work, we propose a strategy to perceive transcribed Devanagari characters utilizing profound convolutional neural organizations (DCNN) which are one of the ongoing procedures embraced from the profound learning network. We tested the ISIDCHAR information base gave by (Information Sharing Index) ISI, Kolkata and V2DMDCHAR information base with six distinct structures of DCNN to assess the exhibition and furthermore research the utilization of six as of late created versatile inclination strategies. A layer-wise method of DCNN has been utilized that assisted with accomplishing the most noteworthy acknowledgment exactness and furthermore get a quicker union rate. The consequences of layer-wise-prepared DCNN are great in correlation with those accomplished by a shallow strategy of high quality highlights and standard DCNN
... Then, when attempts were made for recognition of Devanagari handwritten numerals [5] and characters, these features failed to classify the characters accurately and there was a need of sophisticated feature extractor and classifier. As the work progressed to handwritten character recognition in Devanagari, attempts were made to find the features that could represent the characters efficiently [6][7][8]. Different researchers implemented different feature extraction and classification techniques for Devanagari character recognition. Researchers also tried a combination of several feature extraction and classification techniques in order to improve the performance [9]. ...
Article
Full-text available
The character recognition system is a vital area in the field of pattern recognition. One interesting, complex, and challenging task is handwritten character recognition because of various writing styles of individuals. The accuracy of such systems highly depends upon the extraction and selection of features. Many researchers proposed a variety of feature extraction and classification methods for various scripts including Devanagari. In view of that, this article presents a broad study of feature extraction and classification methods considered so far for online and offline Handwritten Character Recognition (HCR) for Devanagari script, which is essential in Optical Character Recognition (OCR) research. This article presents techniques used by authors, the dataset used, the accuracy achieved by the methods of the work already available for the OCR research. This article is depicting the latest studies, research gaps, challenges and future perspectives for the researchers working in the Devanagari text recognition domain. Moreover, methods developed for feature extraction and classification in the area of Devanagari character recognition are presented in a systematic way as an assistance for future researchers. It has been gathered that traditional feature extraction and classifications methods are being replaced with deep learning methods to achieve higher recognition accuracy in this area.
Article
Full-text available
This paper presents approach for the recognition of handwritten Marathi consonants. In order to recognize handwritten Marathi consonants, a database of handwritten Marathi consonants is developed to carry recognition experiments. Problem of handwritten Marathi consonant recognition is simplified using multilevel classificationwhich improves recognition rate. Total 36 Marathi consonants are transformed using instance simplification technique into six sub classesdepending on special property of consonants. Suitable features are extracted from different sub classes and further classification is carried out using SVM and k-NN classifiers.We have used database of 7920 characters for testing and found recognition accuracy 78.27% using SVM classifier and 73.29% using k-NN classifier.
Article
The handwriting style of every writer consists of variations, skewness and slanting nature and therefore, it is a stimulating task to recognise these handwritten documents. This article presents a study on various methods available in literature for Devanagari handwritten character recognition and performs its implementation using Convolutional neural network (CNN). Available methods are studied on different parameters and a tabular comparison is also presented which concludes superiority of CNN model in character recognition task. The proposed CNN model results in well acceptable accuracy using dropout and stochastic gradient descent (SGD) optimizer.
Article
Full-text available
A method is presented for the machine recognition of constrained, hand printed Devanagari characters. A set of very simple primitives is used, and all the Devanagari characters are looked upon as a concatenation of these primitives. Most of the decisions are taken on the basis of the presence/absence or positional relationship of these primitives; and the decision process is a multistage process, where each stage of decision making narrows down the choice regarding the class membership of the input token.
Conference Paper
Full-text available
Recognition of handwritten characters is a challenging task because of the variability involved in the writing styles of different individuals. In this paper we propose a quadratic classifier based scheme for the recognition of off- line Devnagari handwritten characters. The features used in the classifier are obtained from the directional chain code information of the contour points of the characters. The bounding box of a character is segmented into blocks and the chain code histogram is computed in each of the blocks. Based on the chain code histogram, here we have used 64 dimensional features for recognition. These chain code features are fed to the quadratic classifier for recognition. From the proposed scheme we obtained 98.86% and 80.36% recognition accuracy on Devnagari numerals and characters, respectively. We used five- fold cross-validation technique for result computation.
Article
This paper is concerned with recognition of handwritten Devnagari numerals. The basic objective of the present work is to provide an efficient and reliable technique for recognition of handwritten numerals. Three different types of features have been used for classification of numerals. A multi-classifier connectionist architecture has been proposed for increasing reliability of the recognition results. Experimental results show that the technique is effective and reliable.
Article
A method is described which permits the encoding of arbitrary geometric configurations so as to facilitate their analysis and manipulation by means of a digital computer. It is shown that one can determine through the use of relatively simple numerical techniques whether a given arbitrary plane curve is open or closed, whether it is singly or multiply connected, and what area it encloses. Further, one can cause a given figure to be expanded, contracted, elongated, or rotated by an arbitrary amount. It is shown that there are a number of ways of encoding arbitrary geometric curves to facilitate such manipulations, each having its own particular advantages and disadvantages. One method, the so-called rectangular-array type of encoding, is discussed in detail. In this method the slope function is quantized into a set of eight standard slopes. This particular representation is one of the simplest and one that is most readily utilized with present-day computing and display equipment.
Article
A complete Optical Character Recognition (OCR) system for printed Bangla, the fourth most popular script in the world, is presented. This is the first OCR system among all script forms used in the Indian sub-continent. The problem is difficult because (i) there are about 300 basic, modified and compound character shapes in the script, (ii) the characters in a word are topologically connected and (iii) Bangla is an inflectional language. In our system the document image captured by Flat-bed scanner is subject to skew correction, text graphics separation, line segmentation, zone detection, word and character segmentation using some conventional and some newly developed techniques. From zonal information and shape characteristics, the basic, modified and compound characters are separated for the convenience of classification. The basic and modified characters which are about 75 in number and which occupy about 96% of the text corpus, are recognized by a structural-feature-based tree classifier. The compound characters are recognized by a tree classifier followed by template-matching approach. The feature detection is simple and robust where preprocessing like thinning and pruning are avoided. The character unigram statistics is used to make the tree classifier efficient. Several heuristics are also used to speed up the template matching approach. A dictionary-based error-correction scheme has been used where separate dictionaries are compiled for root word and suffixes that contain morpho-syntactic informations as well. For single font clear documents 95.50% word level (which is equivalent to 99.10% character level) recognition accuracy has been obtained. Extension of the work to Devnagari, the third most popular script in the world, is also discussed.
Article
This paper presents the recognition of handwritten Hindi and English numerals by representing them in the form of exponential membership functions which serve as a fuzzy model. The recognition is carried out by modifying the exponential membership functions fitted to the fuzzy sets. These fuzzy sets are derived from features consisting of normalized distances obtained using the Box approach. The membership function is modified by two structural parameters that are estimated by optimizing the entropy subject to the attainment of membership function to unity. The overall recognition rate is found to be 95% for Hindi numerals and 98.4% for English numerals.
Article
Intensive research has been done on optical character recognition (OCR) and a large number of articles have been published on this topic during the last few decades. Many commercial OCR systems are now available in the market. But most of these systems work for Roman, Chinese, Japanese and Arabic characters. There are no sufficient number of work on Indian language character recognition although there are 12 major scripts in India. In this paper, we present a review of the OCR work done on Indian language scripts. The review is organized into 5 sections. Sections 1 and 2 cover introduction and properties on Indian scripts. In Section 3, we discuss different methodologies in OCR development as well as research work done on Indian scripts recognition. In Section 4, we discuss the scope of future work and further steps needed for Indian script OCR development. In Section 5 we conclude the paper.
Article
The work presented here involves the design of a Multi Layer Perceptron (MLP) based classifier for recognition of handwritten Bangla alphabet using a 76 element feature set Bangla is the second most popular script and language in the Indian subcontinent and the fifth most popular language in the world. The feature set developed for representing handwritten characters of Bangla alphabet includes 24 shadow features, 16 centroid features and 36 longest-run features. Recognition performances of the MLP designed to work with this feature set are experimentally observed as 86.46% and 75.05% on the samples of the training and the test sets respectively. The work has useful application in the development of a complete OCR system for handwritten Bangla text.
Article
A novel, generic scheme for off-line handwritten English alphabets character images is proposed. The advantage of the technique is that it can be applied in a generic manner to different applications and is expected to perform better in uncertain and noisy environments. The recognition scheme is using a multilayer perceptron(MLP) neural networks. The system was trained and tested on a database of 300 samples of handwritten characters. For improved generalization and to avoid overtraining, the whole available dataset has been divided into two subsets: training set and test set. We achieved 99.10% and 94.15% correct recognition rates on training and test sets respectively. The purposed scheme is robust with respect to various writing styles and size as well as presence of considerable noise.