Figure 1 - uploaded by Abdul Basit
Content may be subject to copyright.
The Pashto script 44 alphabets with their pronunciation in English. The characters doubleunderlined are the special characters of the script not found in Arabic and Urdu, whereas the singleunderlined text shows the characters varying in shape only

The Pashto script 44 alphabets with their pronunciation in English. The characters doubleunderlined are the special characters of the script not found in Arabic and Urdu, whereas the singleunderlined text shows the characters varying in shape only

Context in source publication

Context 1
... the language difficult for the visual algorithms to detect and recognize. Among those 44 alphabets, the script consists of 6 special characters that are not found in any other right-to = left cursive scripts. The characters not only vary in shape but also in pronunciation. This makes the script distinguished from other right-to-left scripts, see Fig. 1. The Pashto script 44 alphabets with their pronunciation in English. The characters doubleunderlined are the special characters of the script not found in Arabic and Urdu, whereas the singleunderlined text shows the characters varying in shape only In addition to the six special characters, we have five other alphabets that vary in ...

Similar publications

Article
Full-text available
Achieving precise character segmentation is vital for accurate character recognition. This paper introduces an innovative character segmentation algorithm specifically designed to address the unique properties of Malayalam characters found in Palm-Leaf Manuscripts (PLMs). The initial phase involves conducting an in-depth survey of existing characte...
Article
Full-text available
Uyghur text recognition faces several challenges in the field due to the scarcity of publicly available datasets and the intricate nature of the script characterized by strong ligatures and unique attributes. In this study, we propose a unified three-stage model for Uyghur language recognition. The model is developed using a self-constructed Uyghur...
Conference Paper
Full-text available
Computers have been given vision by researchers across the world for many years. Now it is the era of digitization. Recognizing handwritten text is a must for a computer vision system. Due to the variation and complexity of the cursive writing style, the holistic approach is mostly used for the recognition of cursive scripts. Though Convolutional N...
Article
Full-text available
Identification of the script in multi-script handwritten or printed documents is one of the essential component to recognize the text. The script identification module helps Optical Character Recognition (OCR) to digitize the text present in the multi-script handwritten or printed documents. The similarity of characters between two or more scripts...
Article
Full-text available
The number of speakers of regional languages who are able to read and to write traditional scripts in Indonesia is decreasing. If left unaddressed, this will lead to the extinction of Nusantara scripts and it is not impossible that their reading methods will be forgotten in the future. To anticipate this, this study aims to preserve the knowledge o...

Citations

Article
This article introduces a recognition system for handwritten text in the Pashto language, representing the first attempt to establish a baseline system using the Pashto Handwritten Text Imagebase (PHTI) dataset. Initially, the PHTI dataset underwent pre-processed to eliminate unwanted characters, subsequently, the dataset was divided into training 70%, validation 15%, and test sets 15%. The proposed recognition system is based on multi-dimensional long short-term memory (MD-LSTM) networks. A comprehensive empirical analysis was conducted to determine the optimal parameters for the proposed MD-LSTM architecture; Counter experiments were used to evaluate the performance of the proposed system comparing with the state-of-the-art models on the PHTI dataset. The novelty of our proposed model, compared to other state of the art models, lies in its hidden layer size ( i.e ., 10, 20, 80) and its Tanh layer size ( i.e. , 20, 40). The system achieves a Character Error Rate (CER) of 20.77% as a baseline on the test set. The top 20 confusions are reported to check the performance and limitations of the proposed model. The results highlight complications and future perspective of the Pashto language towards the digital transition.