Contexts in source publication

Context 1
... Mongolian language has a unique writing style which is quite different from Chinese and English, as illustrated in Fig. 1. Firstly, its writing order is vertical from top to bottom and the column order is from left to right. Secondly, all letters of one Mongolian word are conglutinated together in the vertical direction to form a backbone. Thirdly, letters have initial, medial or final presentation forms according to their positions within a ...
Context 2
... experiments described in this section recognize individual words without context. The recognition accuracy can be improved further by applying a language model. In Fig. 10, we show the recognition accuracy in top-N candidates. We can see that over 96% correct answers are included in the top 2 candidates, and the recognition accuracy in top 10 is approaching 99%. It indicates that there are potentials to improve the recognition performance with a language ...

Similar publications

Article
Full-text available
Optical Character recognition is a buzzword in the field of computing. Artificial neural networks are being used to recognize characters for a long time. ANN has the ability to learn and model non-linear and complex relationships, which is really important because in real life, many of the relationships between inputs and outputs are non-linear as...
Preprint
Full-text available
Given the ubiquity of handwritten documents in human transactions, Optical Character Recognition (OCR) of documents have invaluable practical worth. Optical character recognition is a science that enables to translate various types of documents or images into analyzable, editable and searchable data. During last decade, researchers have used artifi...
Article
Full-text available
Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as...
Article
Full-text available
Due to recent advancements widespread in digitalized processing, script identification has become most prominent. It concerns with identifying multilingual scripts. Optical Character Recognition (OCR) has become a promising technique in character reorganization. The primary objective of OCR is to optically sense characters that are machine printed/...
Article
Full-text available
Small strain shear modulus is an important parameter indicating soil stiffness in geotechnical engineering. In this study, multidirectional bender element technique was used to investigate the small strain shear modulus of a calcareous sand from Persian Gulf. The effect of the coefficient of gradation, mean particle size and stress history on GHH,...

Citations

... In recent decades, many European movable-type printings, such as ENP datasets [6] and IMPACT [26], and historical manuscript [10,11,15,30] have been introduced and are available for research. Concerning Mongolian OCR research, significant advances have been made in recent years, such as Multi-font printed Mongolian document recognition [28], handwriting recognition [9,35], contemporary printing recognition [36], and printing with woodcut digitization [34]. However, there is little research on digitizing movable-type document images mainly due to the unavailability of a public database. ...
... The dataset additionally contains numeric characters, special symbols, and grid postfix types. For contemporary printings, Zhang et al. [36] synthesized a large number of about 200,000 vocabulary words with multifont words. However, they were not yet public. ...
... However, the segmentation is hard to deal with cursive or handwritten scripts, and there will be error accumulation for subsequent recognition step. In order to avoid problems caused by segmentation, Zhang et al. [36] proposed a sequence to sequence model. The input images are segmented as multiple frames with equal size, of which the adjacent two frames are overlapped half a frame. ...
Article
Full-text available
OCR approaches have been widely advanced in recent years thanks to the resurgence of deep learning. However, to the best of our knowledge, there is little work on Mongolian movable-type document recognition. One major hurdle is the lack of a domain-specific well-labeled set for training robust models. This paper aims to create the first Mongolian movable type text-image dataset for OCR research. We collated 771 paragraph-level pages segmented from 34 newspapers from 1947 to 1952. For each page, word- and line-level text transcriptions and boundary annotations are recorded. It consists of 86,578 word appearances and 9711 text-line images in total. The vocabulary is 7964. The dataset was finally established from scratch through image collection, text transcription, text-image alignment and manual correction. Moreover, an official train and test set partition is defined on which the typical text segmentation and recognition experiments are tested to set the strong baselines. This dataset is available for research, and we encourage researchers to develop and test new methods using our dataset.
... Therefore, the segmentation of characters will greatly affect the accuracy of recognition. Due to the above special nature of Mongolian, in recent years, many scholars have adopted the non-segmentation strategy [5][6][7] . Datasets for studying Mongolian OCR are relatively easy to obtain. ...
... Datasets for studying Mongolian OCR are relatively easy to obtain. Zhang et al. 5 proposed a model based on sequence to sequence with an attention to recognize non-segmented printed Mongolian text in 2017, and the recognition accuracy of this experiment has reached 89.6%. The dataset used in this experiment belongs to the author and contains about 20,000 words and a total of 80,000 samples. ...
Article
Full-text available
This paper introduces a new traditional Mongolian word-level online handwriting dataset, MOLHW. The dataset consists of handwritten Mongolian words, including 164,631 samples written by 200 writers and covering 40,605 Mongolian common words. These words were selected from a large Mongolian corpus. The coordinate points of words were collected by volunteers, who wrote the corresponding words on the dedicated application for their mobile phones. Latin transliteration of Mongolian was used to annotate the coordinates of each word. At the same time, the writer’s identification number and mobile phone screen information were recorded in the dataset. Using this dataset, we propose an encoder–decoder Mongolian online handwriting recognition model with a deep bidirectional gated recurrent unit and attention mechanism as the baseline evaluation model. Under this model, the optimal performance of the word error rate (WER) on the test set was 24.281%. Furthermore, we present the experimental results of different Mongolian online handwriting recognition models. The experimental results show that compared with other models, the model based on Transformer could learn the corresponding character sequences from the coordinate data of the dataset more effectively, with a 16.969% WER on the test set. The dataset is now freely available to researchers worldwide. The dataset can be applied to handwritten text recognition as well as handwritten text generation, handwriting identification, and signature recognition.
... Currently, recognition Mongolian text can be divided into two categories: letter recognition [6,17,20,23] and end-toend text recognition [22,26]. The method of using letter recognition needs to divide Mongolian words into letters. ...
... With the development of an end-to-end text recognition network, In 2017, Zhang et al. [26] proposed a sequenceto-sequence based attention mechanism to recognize nonsegmented printed Mongolian text. The network encoder comprises deep neural networks (DNN) and long short-term memory (LSTM). ...
Article
Full-text available
Mongolian is a language spoken in Inner Mongolia, China. In the recognition process, due to the shooting angle and other reasons, the image and text will be deformed, which will cause certain difficulties in recognition. This paper propose a triplet attention Mogrifier network (TAMN) for print Mongolian text recognition. The network uses a spatial transformation network to correct deformed Mongolian images. It uses gated recurrent convolution layers (GRCL) combine with triplet attention module to extract image features for the corrected images. The Mogrifier long short-term memory (LSTM) network gets the context sequence information in the feature and finally uses the decoder’s LSTM attention to get the prediction result. Experimental results show the spatial transformation network can effectively recognize deformed Mongolian images, and the recognition accuracy can reach 90.30%. This network achieves good performance in Mongolian text recognition compare with the current mainstream text recognition network. The dataset has been publicly available at https://github.com/ShaoDonCui/Mongolian-recognition.
... The model is optimized by Adam optimizer. We adopt the data preprocessing method in [21] for Mongolian word recognition. Specifically, each word image is divided into frames of equal size, and the height of each frame is set to 300 pixels. ...
Chapter
Full-text available
The scarcity problem of Mongolian handwritten data greatly limits the accuracy of Mongolian handwritten script recognition. Most existing augmentation methods generate new samples by making holistic transformation on the existing samples, which cannot fully reflect the variation of the Mongolian handwritten words. According to the characteristics of Mongolian words, this paper proposes a local word augment approach for Mongolian handwriting data, effectively improving the diversity of the augmented samples. We make local variation on the strokes by moving the endpoints and the out-stroke control point of the strokes and reconstructing the strokes with Bezier splines. The overall generation process is flexible and controllable. Experiment on few-shot Mongolian handwritten OCR demonstrates that our approach significantly improves the recognition accuracy and outperforms the holistic augmentation methods.
... Segmentation is challenging problem in document scenarios when it comes to identifying touching characters and locating cutting points [11]. There are segmentation free character level annotation approaches as well, such as [12] for Mongolian character recognition. Before research community started switching over to deep learning methods, [13] proposed a support vector machine based methodology for recognition of hand written digits. ...
Preprint
Full-text available
div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div
... Segmentation is challenging problem in document scenarios when it comes to identifying touching characters and locating cutting points [11]. There are segmentation free character level annotation approaches as well, such as [12] for Mongolian character recognition. Before research community started switching over to deep learning methods, [13] proposed a support vector machine based methodology for recognition of hand written digits. ...
Preprint
Full-text available
div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div
... Segmentation is challenging problem in document scenarios when it comes to identifying touching characters and locating cutting points [11]. There are segmentation free character level annotation approaches as well, such as [12] for Mongolian character recognition. Before research community started switching over to deep learning methods, [13] proposed a support vector machine based methodology for recognition of hand written digits. ...
Preprint
Full-text available
div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div
... A very important component of recurrent networks in the domain of language is attention which basically multiplies the input vectors across time in the network with a weight matrix to filter and emphasize relevant information. First approaches were already published (e.g., [134] or [231]), inclusion in a framework, such as Calamari, are still pending. ...
Thesis
In recent years, great progress has been made in the area of Artificial Intelligence (AI) due to the possibilities of Deep Learning which steadily yielded new state-of-the-art results especially in many image recognition tasks. Currently, in some areas, human performance is achieved or already exceeded. This great development already had an impact on the area of Optical Music Recognition (OMR) as several novel methods relying on Deep Learning succeeded in specific tasks. Musicologists are interested in large-scale musical analysis and in publishing digital transcriptions in a collection enabling to develop tools for searching and data retrieving. The application of OMR promises to simplify and thus speed-up the transcription process by either providing fully-automatic or semi-automatic approaches. This thesis focuses on the automatic transcription of Medieval music with a focus on square notation which poses a challenging task due to complex layouts, highly varying handwritten notations, and degradation. However, since handwritten music notations are quite complex to read, even for an experienced musicologist, it is to be expected that even with new techniques of OMR manual corrections are required to obtain the transcriptions. This thesis presents several new approaches and open source software solutions for layout analysis and Automatic Text Recognition (ATR) for early documents and for OMR of Medieval manuscripts providing state-of-the-art technology. Fully Convolutional Networks (FCN) are applied for the segmentation of historical manuscripts and early printed books, to detect staff lines, and to recognize neume notations. The ATR engine Calamari is presented which allows for ATR of early prints and also the recognition of lyrics. Configurable CNN/LSTM-network architectures which are trained with the segmentation-free CTC-loss are applied to the sequential recognition of text but also monophonic music. Finally, a syllable-to-neume assignment algorithm is presented which represents the final step to obtain a complete transcription of the music. The evaluations show that the performances of any algorithm is highly depending on the material at hand and the number of training instances. The presented staff line detection correctly identifies staff lines and staves with an $F_1$-score of above $99.5\%$. The symbol recognition yields a diplomatic Symbol Accuracy Rate (dSAR) of above $90\%$ by counting the number of correct predictions in the symbols sequence normalized by its length. The ATR of lyrics achieved a Character Error Rate (CAR) (equivalently the number of correct predictions normalized by the sentence length) of above $93\%$ trained on 771 lyric lines of Medieval manuscripts and of 99.89\% when training on around 3.5 million lines of contemporary printed fonts. The assignment of syllables and their corresponding neumes reached $F_1$-scores of up to $99.2\%$. A direct comparison to previously published performances is difficult due to different materials and metrics. However, estimations show that the reported values of this thesis exceed the state-of-the-art in the area of square notation. A further goal of this thesis is to enable musicologists without technical background to apply the developed algorithms in a complete workflow by providing a user-friendly and comfortable Graphical User Interface (GUI) encapsulating the technical details. For this purpose, this thesis presents the web-application OMMR4all. Its fully-functional workflow includes the proposed state-of-the-art machine-learning algorithms and optionally allows for a manual intervention at any stage to correct the output preventing error propagation. To simplify the manual (post-) correction, OMMR4all provides an overlay-editor that superimposes the annotations with a scan of the original manuscripts so that errors can easily be spotted. The workflow is designed to be iteratively improvable by training better models as soon as new Ground Truth (GT) is available.
... This is a significant limitation for CNN, since many complex problems (e.g. machine translation) are expressed as sequences whose lengths are not fixed and equal. For this reason, a seq2seq model has been presented for realizing machine translation [28], speech recognition [29], machineprinted recognition [30] and handwriting recognition [31], separately. ...
... Writing order Recently, we have presented a segment-free traditional Mongolian OCR system in [18]. The traditional Mongolian word can be recognized directly, which avoids the problem of segmentation. ...