The Pashto script 44 alphabets with their pronunciation in English. The characters doubleunderlined are the special characters of the script not found in Arabic and Urdu, whereas the singleunderlined text shows the characters varying in shape only

Source publication

Baseline Isolated Printed Text Image Database for Pashto Script Recognition

Article

Full-text available

Jan 2023

Context 1

... the language difficult for the visual algorithms to detect and recognize. Among those 44 alphabets, the script consists of 6 special characters that are not found in any other right-to = left cursive scripts. The characters not only vary in shape but also in pronunciation. This makes the script distinguished from other right-to-left scripts, see Fig. 1. The Pashto script 44 alphabets with their pronunciation in English. The characters doubleunderlined are the special characters of the script not found in Arabic and Urdu, whereas the singleunderlined text shows the characters varying in shape only In addition to the six special characters, we have five other alphabets that vary in ...

View in full-text

Sample Malayalam characters with vowel sounds [28]

Enhancing Malayalam Palm Leaf Character Segmentation: An Improved Simplified Approach

Article

Full-text available

May 2024

Achieving precise character segmentation is vital for accurate character recognition. This paper introduces an innovative character segmentation algorithm specifically designed to address the unique properties of Malayalam characters found in Palm-Leaf Manuscripts (PLMs). The initial phase involves conducting an in-depth survey of existing characte...

Figure 4. (a) ResNet and (b) ConvNeXt feature extraction network block.

Figure 5. Deep bidirectional LSTM network.

Figure 6. Connectionist temporal classification (CTC).

Figure 8. Data enhancement of Uyghur images.

A Three-Stage Uyghur Recognition Model Combining the Attention Mechanism and Different Convolutional Recurrent Networks

Article

Full-text available

Aug 2023

Uyghur text recognition faces several challenges in the field due to the scarcity of publicly available datasets and the intricate nature of the script characterized by strong ligatures and unique attributes. In this study, we propose a unified three-stage model for Uyghur language recognition. The model is developed using a self-constructed Uyghur...

Performance Analysis of Vision Transformer Based Architecture for Cursive Handwritten Text Recognition

Conference Paper

Full-text available

Feb 2024

Computers have been given vision by researchers across the world for many years. Now it is the era of digitization. Recognizing handwritten text is a must for a computer vision system. Due to the variation and complexity of the cursive writing style, the holistic approach is mostly used for the recognition of cursive scripts. Though Convolutional N...

A text sample from Cyrillic script having exactly same shaped...

A text sample from Bengali script having almost similar alike...

The complete framework of the proposed method to recognize the script...

The accuracy obtained during the training, validation, and testing...

Performance comparison of the proposed method using accuracy,...

Script identification in handwritten and printed documents using convolutional recurrent connection

Article

Full-text available

Apr 2024

Amar Jindal

Identification of the script in multi-script handwritten or printed documents is one of the essential component to recognize the text. The script identification module helps Optical Character Recognition (OCR) to digitize the text present in the multi-script handwritten or printed documents. The similarity of characters between two or more scripts...

Fig. 1 List of samples of Nusantara Scripts used in this research.

Fig. 3 The CNN Architecture in one of the Nusantara Characters Reader.

Fig. 4 ConvMixer's Basic Architecture Block.

Fig. 6 Performance of Script Type Recognition Model.

Deep Learning Approaches for Nusantara Scripts Optical Character Recognition

Article

Full-text available

Jul 2023

The number of speakers of regional languages who are able to read and to write traditional scripts in Indonesia is decreasing. If left unaddressed, this will lead to the extinction of Nusantara scripts and it is not impossible that their reading methods will be forgotten in the future. To anticipate this, this study aims to preserve the knowledge o...

Deep learning-based recognition system for pashto handwritten text: benchmark on PHTI

Article

Mar 2024

This article introduces a recognition system for handwritten text in the Pashto language, representing the first attempt to establish a baseline system using the Pashto Handwritten Text Imagebase (PHTI) dataset. Initially, the PHTI dataset underwent pre-processed to eliminate unwanted characters, subsequently, the dataset was divided into training 70%, validation 15%, and test sets 15%. The proposed recognition system is based on multi-dimensional long short-term memory (MD-LSTM) networks. A comprehensive empirical analysis was conducted to determine the optimal parameters for the proposed MD-LSTM architecture; Counter experiments were used to evaluate the performance of the proposed system comparing with the state-of-the-art models on the PHTI dataset. The novelty of our proposed model, compared to other state of the art models, lies in its hidden layer size ( i.e ., 10, 20, 80) and its Tanh layer size ( i.e. , 20, 40). The system achieves a Character Error Rate (CER) of 20.77% as a baseline on the test set. The top 20 confusions are reported to check the performance and limitations of the proposed model. The results highlight complications and future perspective of the Pashto language towards the digital transition.

The Pashto script 44 alphabets with their pronunciation in English. The characters doubleunderlined are the special characters of the script not found in Arabic and Urdu, whereas the singleunderlined text shows the characters varying in shape only

Context in source publication

Similar publications

Citations