ArticlePDF Available

Text to Speech Conversion

October 2016
Indian Journal of Science and Technology 9(38)

October 2016
9(38)

DOI:10.17485/ijst/2016/v9i38/102967

Authors:

S. Venkateswarlu

K L University

Duvvuri B K Kamesh Duvvuri

Malla Malla Reddy Engineering college for Women

Sastry Jammalamadaka

K L University

Chintala Radhika Rani

K L University

The present paper has introduced an innovative, efficient and real-time cost beneficial technique that enables user to hear the contents of text images instead of reading through them. It combines the concept of Optical Character Recognition (OCR) and Text to Speech Synthesizer (TTS) in Raspberry pi. This kind of system helps visually impaired people to interact with computers effectively through vocal interface. Text Extraction from color images is a challenging task in computer vision. Text-to-Speech conversion is a method that scans and reads English alphabets and numbers that are in the image using OCR technique and changing it to voices. This paper describes the design, implementation and experimental results of the device. This device consists of two modules, image processing module and voice processing module. The device was developed based on Raspberry Pi v2 with 900 MHz processor speed.

Block diagram of text-to-speech device.

…

Software design of image processing module.

…

Figures - uploaded by Duvvuri B K Kamesh Duvvuri

Content may be subject to copyright.

Content uploaded by Duvvuri B K Kamesh Duvvuri

Content may be subject to copyright.

*Author for correspondence

Indian Journal of Science and Technology, Vol 9( 3 8) , DO I : 1 0. 1 74 8 5/ i js t /2 0 16 / v 9i 3 8/ 1 02 9 67 , O c to b er 20 1 6

ISSN (Print) : 0974-6846

ISSN (Online) : 0974-5645

Tex t to Sp eech Conver si on

S. Venkateswarlu1*, D. B. K. Kamesh1, J. K. R. Sastry2 and Radhika Rani2

1Department of CSE, K L University, Vaddeswarm, Guntur – 522502, Andhra Pradesh, India; somu23@kluniversity.in,

kameshdbk@kluniversity.in, radhikarani_cse@kluniversity.in

2Department of ECM, K L University, Vaddeswarm, Guntur – 522502, Andhra Pradesh, India;

drsastry@kluniversity.in

Keywords: Image Processing, OCR, Text Extraction, Text-to-speech, Voice Processing

Abstract



     



     









1. Introduction

Optical character Recognition (OCR) is a process that

converts scanned or printed text images1, handwritten

text into editable text for further processing. is paper

has presented a robust approach for text extraction and

converting it to speech. Testing of device was done on

raspberry pi platform. e Raspy is initially connected

to the internet through VLAN. e soware is installed

using command lines. Following steps are to be followed:

1. e rst setup is to download the installation script,

2. Second step is to convert it to executable form and

3. e last step starts the script which does the rest of

the installation work.

Device set up is done as shown in Figure 1. e web-

cam is manually focused towards the text. en, it takes

a picture; a delay of around 7 seconds is provided, which

helps to focus the webcam, if it is accidently defocused.

Aer delay, picture is taken and processed by Raspy to

hear the spoken words of the text through the earphone

or speaker plugged into Raspy through its audio jack.

Figure 1. Block diagram of text to speech conversion.

2. Methodology

Te xt - t o- s p e ec h d ev i c e co n s is t s o f tw o m ai n m o du l e s, t h e

image processing module and voice processing modules.

Indian Journal of Science and TechnologyVol 9 (38) | October 2016 | www.indjst.org

Text to Speech Conversion

Image processing module captures image using camera,

converting the image into text. Voice processing mod-

ule changes the text into sound and processes it with

specic physical characteristics so that the sound can be

understood. Figure 2 shows the block diagram of Text-

To-Speech device, 1st block is image processing module,

where OCR converts .jpg to .txt form. 2nd is voice process-

ing module which converts .txt to speech

Figure 2. Block diagram of text-to-speech device.

Figure 2shows the block diagram of Text-To-Speech

device, 1st block is image processing module, where OCR

converts .jpg to .txt form. 2nd is voice processing module

which converts .txt to speech. OCR is important element

in this module. OCR or Optical Character Recognition is

a technology that automatically recognize the character

through the optical mechanism, this technology imitate

the ability of the human senses of sight, where the cam-

era becomes a replacement for eye and image processing

is done in the computer engine as a substitute for the

human brain2. Tesseract OCR is a type of OCR engine

with matrix matching3. e selection of Tesseract engine

is because of its exibility and extensibility of machines

and the fact that many communities are active researchers

to develop this OCR engine and also because Tesseract

OCR can support 149 languages. In this project we are

identifying English alphabets. Before feeding the image to

the OCR, it is converted to a binary image to increase the

recognition accuracy. Image binary conversion is done

by using Imagemagick soware, which is another open

source tool for image manipulation. e output of OCR

is the text, which is stored in a le (speech.txt). Machines

still have defects such as distortion at the edges and dim

light eect, so it is still dicult for most OCR engines to

get high accuracy text4. It needs some supporting and

condition in order to get the minimal defect. Tesseract

OCR Implementation.

2.1 Soware Design

Soware processes the input image and converted into

text format. e soware implementation is showed in

Figure 3.

Figure 3. Soware design of image processing module.

2.2 e Voice Processing Module

In this module text is converted to speech. e out-

put of OCR is the text, which is stored in a le (speech.

txt). Here, Festival soware is used to convert the text to

speech. Festival is an open source Text To Speech (TTS) 7,8

system, which is available in many languages. In this proj-

ect, English TTS 9–11system is used for reading the text.

3. Results

Observed outcome of project:

• Text is extracted from the image and converted

to audio.

• It recognizes both capital as well as small letters.

• It recognizes numbers as well.

• Range of reading distance was 38-42cm.

• Character font size should be minimum 12pt.

• Maximum tilt of the text line is 4-5 degree from

the vertical.

4. Conclusion

Text-to-Speech device can change the text image input

into sound with a performance that is high enough and

a readability tolerance of less than 2%, with the average

time processing less than three minutes for A4 paper size.

is portable device, does not require internet connec-

tion, and can be used independently by people. rough

Indian Journal of Science and Technology

Vol 9 (38) | October 2016 | www.indjst.org

S. Venkateswarlu, D. B. K. Kamesh, J. K. R. Sastry and Radhika Rani

this method, we can make editing process of books or web

pages easier.

5. References

1. Archana A, Shinde D. Text pre-processing and text seg-

mentation for OCR. International Journal of Computer

Science Engineering and Technology. 2012:810–12.

2. Mithe R, Indalkar S, Divekar N. Optical character recog-

nition. International Journal of Recent Technology and

Engineering. 2013 Mar; 2(1).

3. Smith R. An overview of the Tesseract OCR engine, USA:

Google Inc; 2007.

4. Shah H, Shah A. Optical character recognition of Gujarati

numerical. International Conference on Signals, Systems

and Automation. 2009; 49–53.

5. Monk S. Raspberry pi cook.

6. Text localization and extraction in images using mathemat-

ical morphology and OCR Techniques; 2013.

7. Vanitha E, Kasarla PK, Kuamarswamy E. Implementation

of text- to-speech for real time embedded system using

Raspberry Pi processor. International Journal and Magazine

of Engineering Technology Management and Research.

2015 Jul:1995.

8. Kumar GS, Krishna MNVLM. Low cost speech recognition

system running on Raspberry Pi to support Automation

applications. International Journal of Engineering Trends

and Technology. 2015; 21(5).

9. Bhargava A, Nath KV, Sachdeva P, Samel M. Reading assis-

tant for visually Impaired. International Journal of current

Engineering and Technology. 2015 Apr; 5(2).

10. Gomes LCT, Nagle EJ, Chiquito JG. Text-to-speech conver-

sion system for Brazilian Portuguese using a formant-based

synthesis technique. LPS-DECOM-FEEC-Unicamp.

11. Sim Liew Fong, Abdelrahman Osman Elfaki, Md Gapar

bin Md Johar & Kevin Loo Tow Aik, Mobile Language

Translator, 5th Malaysian Conference in Soware

Engineering (Misses); 2011.

12. Kamesh DBK, Nazma SK, Sastry JKR, Venkateswarlu S.

Camera based text to speech conversion, obstacle and cur-

rency detection for blind persons. Indian Journal of Science

and Technology. 2016 Aug; 9(30).

Raspberry Pi based braille keyboard design with audio output for the visually challenged

Conference Paper

Full-text available

Jul 2023

Serhat Küçükdermenci

Most blind and visually impaired students in third world countries still use mechanical braillefor their education. With the advancement of technology and the spread of electronic communication,paper-based Braille is not effective and efficient enough. The Raspberry Pi-based Braille keyboard designwith audio output is a low-cost electronic keyboard whose main features are to vocalize Braille characterswritten by a visually impaired student and display them on an LCD screen. Proposed to promote aninteractive educational experience among students, teachers and parents, the Braille printer is affordableand cost-effective with advanced features. The design of the device is simple as it is based on Raspberry Pitechnology. The user hears the output after a short buzzer beep when the character typing process isfinished. gTTS (Google Text-to-Speech) is a Python package and Google Translates text-to-speech API isused to convert text to speech. The data is displayed on an LCD screen for the non-visually impaired(teacher/parent). The Braille keyboard study is designed through the Proteus simulation program. This workfocuses on developing a Braille keyboard for later stages that allows users to use the Braille writing systemto enter text and communicate with digital devices.

Object and Text Detection

Article

May 2023

The main aim of our project is to develop a portable raspberry pi implemented gadget for object detection with relative motion and distance. This technology is basically used for conversion of sequence of real time objects into series of text which can be further stored into database and can be utilized to assist visually impaired people and in various security purposes as well. For that purpose, the conversion system is proposed in this project. Our system basically operates in 2 different modes. One is detecting the class of objects nearby with the help of R-CNN network, and the second one is obstacle detection using ultrasonic sensor. It includes 3 buttons for mode selection and the system operates on the basis of mode selection. It includes camera to capture an image as input, and input image is then passed to the R-CNN that recognizes number of objects inside image, their classes and types, text written inside and which is then can be passed to the database for a storage.

Smart mail: Voice based email system for blind

Conference Paper

Jan 2024

Speech and Text based Assistive Application for Special Needs Individuals

Article

Apr 2024

As per the survey report released last year on disability by National Statistics Office, it was seen that people with some kind of mental or physical disability is around 2.2% of India’s total population. The project, “Saksham” that suggests be independent, aims to eliminate the need for human assistance and to provide equal opportunities and a more normal way of life to those with language or other disabilities. In the direction of building individual strength and also get great improvement in the academic performance of adults and kids with special needs, Assistive technology is now being used as an innovative tool. The entire application have been developed keeping in mind that it needs to provide all our users with instantaneous access to selected features specially catered to help them in completing their daily tasks.

Indian Sign Language Recognition with Conversion to Bilingual Text and Audio

Conference Paper

Oct 2023

Multilingual Interpreter and Translator

Article

Dec 2023

Over the years, the issue of language differences has made it more difficult to communicate information effectively between countries and efficient information exchange has been hampered by the issue of linguistic differences. The conventional method employed to address linguistic barriers has not proven beneficial or effective. In the present, language interpreters need to be proficient in both the language they are translating into and the original language. The traditional methods of resolving language differences have not proven beneficial or productive. Additionally, language difference issues can make teaching foreign languages challenging. Multilingual Interpreters or Translators play a pivotal role in facilitating effective communication and understanding across languages. In order to simplify language learning and translation and promote stressfree communication, the study creates an Android language converter application, which can work more efficiently with an optimized code for the process of translation. Unlike traditional translation apps, our model leverages advanced natural language processing and machine learning algorithms to provide users with an intuitive and context-aware multilingual interpretation experience. In terms of communication, this application can help tourists integrate with the locals and obtain the necessary information.

Machine learning-based text to speech conversion for native languages

Conference Paper

Jul 2023

TTS System for Deafened and Vocally impaired persons in Native Language

Article

Aug 2023
J INTELL FUZZY SYST

This is contrary for Voice impaired people since their speech is tough for others to recognize even by their parents and teachers. Provided if their parents are illiterate. So our TTS system can be used for converting their written text to speech for their illiterate parents and friends around them. Though many methods have been adopted for the concatenation of the basic sound units, the HMM-based approach in modeling the sound is utilized by many researchers in many languages. In this paper, we have tried to implement, text to speech systems of synthesis for a Tamil text uses a phonemic concatenation approach in MATLAB. Instead of utilizing Tamil letters as it is, due to its difficulty in production, Tamil text is transliterated into English then it is converted into intelligible speech. The performance of the output is verified for various examples by changing its parameters, in which the quality of the sound is comparable to that of English text. So the proposed system is utilized for all languages other than Tamil also if it is properly transliterated for limited vocabulary.

Recogniition and Speech Conversion of Devnagri Script using CNN

Conference Paper

Mar 2023

Voice E-Mail Synced with Gmail for Visually Impaired

Conference Paper

Feb 2023

Camera based Text to Speech Conversion, Obstacle and Currency Detection for Blind Persons

Article

Full-text available

Aug 2016

Background/Objectives: The main object of this paper is to present an innovated system that can help the blind for handling currency. Methods/Statistical Analysis: Many image processing techniques have been used to scan the currency, remove the noise, mark the region of interest and convert the image into text and then to sound which can be heard by the blind. The entire system is implemented by using Raspberry Pi Micro controller based system. In the proto type model an IPR sensor is used instead of camera for sensing the object. Findings: In this paper a novel method has been presented using which one can recognize the object, mark the interesting region within the object, scan the text and convert the scanned text into binary characters through optical recognition. A second method has been presented using which the noise present in the scanned image is eliminated before characters are recognized. A third method that can be used to convert the recognised characters into e-speech through pattern matching has also be presented. Applications: An embedded system has been developed based on ARM technology which helps the blind persons to read the currency notes. All the methods presented in this paper have been implemented within an embedded application. The embedded board has been tested with different currency notes and the speech in English has been generated that identify the value of the currency. Further work can be done to generate the speech in different other both National and International Languages.

Low cost speech recognition system running on Raspberry pi to support automation applications

Article

Mar 2015

Preparation for Optical Character Recognition

Article

Nov 1966

R. M. Paine

This paper discusses the possible advantages of optical character recognition and matrix mark scanning for commercial organizations. Several available machines are mentioned and possible founts are considered in relation to the needs of Eastern Electricity. The problems of form layout, printing and paper quality are given attention, as well as the overall reject rates and the economics of O.C.R. The paper concludes with a review of actual developments and hoped for developments in optical scanning. This paper was first presented to the Glasgow Branch of the British Computer Society on 4th April, 1966.

Text Pre-processing and Text Segmentation for OCR

Article

Optical Character Recognition (OCR) systems have been effectively developed for the recognition of printed script. The accuracy of OCR system mainly depends on the text preprocessing and segmentation algorithm being used. When the document is scanned it can be placed in any arbitrary angle which would appear on the computer monitor at the same angle. This paper addresses the algorithm for correction of skew angle generated in scanning of the text document and a novel profile based method for segmentation of printed text which separates the text in document image into lines, words and characters.

An Overview of the Tesseract OCR Engine

Conference Paper

Oct 2007

R. Smith

The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.

Text-to-speech conversion system for Brazilian Portuguese using a formant-based synthesis technique

Conference Paper

Sep 1998

This paper contains a general description of the text-to-speech conversion system for Brazilian Portuguese developed at the Signal Processing Laboratory of FEEC-Unicamp. The system is fully operational and performs the whole process of text-to-speech conversion, including components for text processing (preprocessing, grammatical classification and phonetic transcription), prosodic processing (generation of intonation patterns and segmental durations) and signal processing (using the Klatt formant synthesizer, with a synthesis-by-rule technique for generating synthesizer control parameters)

Optical character recognition of Gujarati numerical

Jan 2009
49-53

H Shah
A Shah

Shah H, Shah A. Optical character recognition of Gujarati numerical. International Conference on Signals, Systems and Automation. 2009; 49-53.

Implementation of text-to-speech for real time embedded system using Raspberry Pi processor

Jan 1995

E Vanitha
P K Kasarla
E Kuamarswamy

Vanitha E, Kasarla PK, Kuamarswamy E. Implementation of text-to-speech for real time embedded system using Raspberry Pi processor. International Journal and Magazine of Engineering Technology Management and Research. 2015 Jul:1995.

Reading assistant for visually Impaired

Apr 2015

A Bhargava
K V Nath
P Sachdeva
M Samel

Bhargava A, Nath KV, Sachdeva P, Samel M. Reading assistant for visually Impaired. International Journal of current Engineering and Technology. 2015 Apr; 5(2).

Md Gapar bin Md Johar & Kevin Loo Tow Aik, Mobile Language Translator

Jan 2011

Sim Liew Fong
Abdelrahman Osman Elfaki

Sim Liew Fong, Abdelrahman Osman Elfaki, Md Gapar bin Md Johar & Kevin Loo Tow Aik, Mobile Language Translator, 5th Malaysian Conference in Software Engineering (Misses); 2011.

Text to Speech Conversion

Abstract and Figures

Recommended publications

Design and Implementation of Text to Speech Synthesizer

Model for Converting PDF to Audio Format (Listen Your Book)

Smart Glass for Visually Challenged Peoples to Read the Books using Raspberry Pi

Camera based Text to Speech Conversion, Obstacle and Currency Detection for Blind Persons