Figure 1 - uploaded by Manjunath Aradhya
Content may be subject to copyright.
Kannada language 49 phonemic letters. 

Kannada language 49 phonemic letters. 

Source publication
Conference Paper
Full-text available
Character segmentation has become a crucial task for character recognition in many OCR systems. It is an important step because incorrectly segmented characters are unlikely to be recognized correctly. For segmenting a cursive scripts leads more challenging because of presence of more touching characters. Kannada is the one of the popular language...

Context in source publication

Context 1
... we proposed a new character segmentation algorithm for Kannada script. The outline of the paper is follows: In section 2, we explain the properties of Kannada script. In section 3, explain the proposed methodology. Experiment results is presented in section 4. Finally conclusion are drawn at the end. Kannada script is written horizontally from left to right and an absent of lower and upper case like in English language. Moreover the Kannada characters are formed by combination of basic symbols, segmentation of the Kannada character is complex and challenging task & increased character set, it contains Vowels, Consonants Compound characters. Some of the character may overlap together. Kannada text is difficult when compared with Latin based languages because of its structured complexity. Moreover, Kannada language uses 49 phonemic letters, shown in Figure 1 it is divided into 3-groups,Vowels (Swaragalu- Anusvara (o), & Visarga (:)15), Consonants (Vyanjanagalu-34) and modifier glyphs (Half-letter) from the 15 vowels are used, to alter the 34 base consonants, creating a total of (34*15) +34=544 characters, sample of modifier glyphs is as shown in the Figure 2, additionally a consonants emphasis glyph called Consonant conjuncts in Kannada (vattakshar/ also called extra modifiers in English ), exists for each of the 34 consonants. This gives total of (544*34) +15=18511 distinct characters [10], samples of extra modifiers shown in the Figure 3. This section presents the proposed methodology of unconstrained handwritten Kannada character segmentation. Initially, all components in a word image are detected by connected component analysis (CCA) algorithm which is as shown in the Figure 4. For a component c i , its height and width of a component are represented by h i and w i respectively. From this process the components those having average and below average height and width components are segmented, the remaining components are considered as touching components. To segment these touching components we follow three steps namely: Thinning, Branch Points and Mixture Models. All these steps are explained in following subsections. The characteristic of Kannada script is cursive in nature and also from the statistical analysis most of the touching portion can be find within the half height from the bottom of character. These touching causes mainly because of modifiers and extra modifiers. To find this touching portion in the components we have applied morphological thinning operation to for further process, which is as shown in ...

Similar publications

Article
Full-text available
The robustness of a typical Handwritten character recognition system relies on the availability of comprehensive supervised data samples. There has been considerable work reported in the literature about creating the database for several Indic scripts, but the Tamil script has only one standardized database up to date. This paper presents the work...
Conference Paper
Full-text available
Unconstrained handwriting recognition is an essential task in document analysis. It is usually carried out in two steps. First, the document is segmented into text lines. Second, an Optical Character Recognition model is applied on these line images. We propose the Simple Predict & Align Network: an end-to-end recurrence-free Fully Convolutional Ne...
Research
Full-text available
Optical character recognition (OCR) is a strategy to perceive character from optically checked and digitized pages. OCR plays an important role for Indian script research. The official language of the state Odisha is Odia. OCR face an incredible difficulties to recognize Odia language due to similar shape characters, their complex nature, the compl...
Article
Full-text available
Traditional systems of handwriting recognition have relied on handcrafted features and a large amount of prior knowledge. Training an Optical character recognition (OCR) system based on these prerequisites is a challenging task. Research in the handwriting recognition field is focused around deep learning techniques and has achieved breakthrough pe...
Preprint
Full-text available
Unconstrained handwriting recognition is an essential task in document analysis. It is usually carried out in two steps. First, the document is segmented into text lines. Second, an Optical Character Recognition model is applied on these line images. We propose the Simple Predict & Align Network: an end-to-end recurrence-free Fully Convolutional Ne...

Citations

... Different character segmentation and recognition technique used in handwritten words, cursive words, numerals till 1996 are discussed in [17] . Various papers have been published on segmentation in English [4], Bangla [21,22], Punjabi [11,12,19], Kannada [18,20]. Segmentation of words in roman script using Potential Segmentation column (PCS) is given by Chaudhary.A. et al. [5]. ...
Article
Full-text available
In this paper, SFF (Segmentation Facilitate Feature) technique is proposed to find the junction path to segment touched components based on the seed pixel selected among candidate pixels. Handwritten Recognition system has number of applications like reading postal address, filling forms, reading bank cheques, offering several challenges. In practice, constitute of the word images get touched in handwritten data due to variability in stroke, shortage of space which make the individual character extraction from the word image more complicated. Segmentation of individual in a word image requires a technique that takes care of the variability of writing. This paper proposed the SFF (Segmentation Facilitate Feature) technique to find seed pixel among candidate pixels based on 3-neighbouring pixels. It is used to find junction pixels which form a junction path to segregate the touched component. The junction path is selected to avoid the issues arising due to artifacts or deletion of components features. For experimentation, 1840 legal amount words containing touching components are used. The above number includes 250 words from benchmark database (ICDAR) and 1590 words are gathered from 15 different writers. On implementing, SFF (Segmentation Facilitate Feature) technique on the above mentioned database, 89.9% accuracy is achieved and a higher accuracy level 96.2% is achieved when performed on 1000 words containing two touching consonants.
... Expectation maximization algorithm was used to study a mixture of Gaussians. Cluster mean points were used to calculate the directions and branch point for segmenting characters in an image [6]. ...
Article
Full-text available
Kannada handwritten reports were the only way of documentation available in government offices and healthcare departments in Karnataka state. Reproducing the contents of these old documents through typewriting is a tedious task, as the documents are difficult to read and understand. Hence there is a need for a computer-based system to overcome the gap between machines and humans. The paper proposes an efficient method for Kannada handwritten character recognition system which uses image preprocessing techniques to enhance the quality of an image and exploring deep learning technique for feature extraction. The layout of the proposed method is kept simple and easy to understand for a user. Chars74K dataset was used for experimentation of the work. Experiments were performed on handwritten Kannada vowels and consonants consisting of 25 handwritten characters in 657 classes. To validate the model, Categorical Cross-entropy loss function was used with 15 epochs to measure the error rate. The model gave a prediction performance of 95.11% for the training set and 86 % for the testing set. The proposed model would be highly useful in government sectors for documentation purpose.
... The efficiency of segmentation techniques depends on promptness, proficient matching of outline attributes of image, enhancement of contour connectivity etc [1]. The expectation-maximization algorithm has been executed to segregate handwritten character of Kannada scripts that involves the combination of Gaussian function and achieves 85.5% accuracy rate [2]. To segment text from English, Kannada and Tamil scripts the Stroke Width Transform (SWT) and Grab cut algorithm has been implemented. ...
... σ xy is estimated covariance value for two identical images. In order to cast off the division value by weak denominator, c 1 are treated as the variable (k 1 L) 2 and c 2 as (k 2 L) 2 respectively. The values for k 1 and k 2 are set default as 0.01 and 0.03 correspondingly. ...
... The drawback of this method is that it degrades some part of characters due to segmentation. Expectation-maximization algorithm is proposed for learning mixtures of Gaussians [13]. Mixture models are used for finding the direction of segmenting touching characters. ...
Conference Paper
Full-text available
Segmentation of handwritten document images is a complex task due to the variability in the writing styles. The segmentation technique has to deal with non-uniformly skewed, overlapped and touching lines. A very few works have been carried out yet, addressing these issues. This paper presents a novel methodology for segmenting handwritten Malayalam documents into its constituent lines, words and characters addressing the issues mentioned. Water flow technique is used in extraction of text lines. An algorithm has been proposed for dealing with touching and overlapping lines. Words from the text lines are detected using Spiral Run Length Smearing Algorithm (SRLSA). Further, skew correction is done on extracted words and the skew corrected words are produced for character segmentation. Skew correction is incorporated for ease of the recognition stage in handwritten Malayalam OCR.
... In character segmentation, the word image is divided into small segmented regions each of which probably contains an isolated character and then is detected by the CC technique (Naveena and Aradhya, 2012). Usually, the vertical projection profile, which is a traditional segmentation technique, is used (Xiu et al., 2006). ...
Thesis
Full-text available
Handwritten character recognition plays an important role in transforming raw visual image data obtained from handwritten documents using for example scanners to a format which is understandable by a computer. It is an important application in the field of pattern recognition, machine learning and artificial intelligence. There are already different handwritten character recognition systems that have been designed for commercial purposes, such as mail sorting and bank cheque processing. Furthermore, this type of research can help to search through different historical handwritten manuscript collections. In this way the cumulative historical information can become accessible to a wide public. In this PhD research, several methods are proposed to deal with several challenges that occur when trying to recognize handwritten characters from multiple language scripts. The thesis contributes to all levels of processing isolated character images: from intensity normalization to segmentation, and from feature extraction to the final classification. Moreover, solutions are proposed for recognizing isolated handwritten character images when not very many handwritten character examples are available. The main goal of the research presented in this dissertation is to study robust feature extraction techniques and machine learning techniques for handwritten character recognition. The best techniques are the combination of the histogram of oriented gradients with bags of visual words. Furthermore, a new method for line segmentation is proposed, which is a part of document layout analysis. The novel techniques have been tested on many different scripts and the results show that they effectively address the problems of line segmentation and character recognition.
Chapter
In this work, we propose a method for segmentation of handwritten Kannada character from answer scripts. Character segmentation plays an important role in Kannada optical character recognition (OCR) system, because characters incorrectly segmented perform to unrecognized character. This paper provides improved segmentation algorithm based on contour and bounding box method. To improve segmentation accuracy, built system works on two stages: preprocessing and segmentation stage. The modified abovementioned method has been successfully tested on 100 real-time documents of Kannada handwritten scripts collected from different schools. The results are very promising, indicating the efficiency of the suggested approach.KeywordsAnswer scriptsKannadaPreprocessingSegmentationOCRAugmentation
Article
Full-text available
Recognition of Kannada Characters is a complex task as the number of classes in Kannada language by considering all combinations of vowels and consonants is 623,893. In this paper, the complexity is reduced from 623,893 to just having 313 classes as Main aksharas (Vowel, Consonants,Vowel modifiers and Consonant modifiers) and 30 classes as vattu aksharas(conjuncts) by using two line segmentation. A novel CNN model for recognition of printed and handwritten Kannada characters is proposed. CNN model with two, three and four layers are designed for Main akshara and Vattu aksharas with different filter size. The database consists of total of 31,300 samples and 3000 samples of printed and handwritten characters of Main akshara and Vattu aksharas respectively. Simulation result revealed that CNN model with four layer architecture is the best model for recognition of Kannada characters. This model achieved a recognition accuracy of 98.83% and 99.29% for printed Main akshara and Vattu aksharas and 82.50% and 80.92% for handwritten main and vattu akshara respectively.