ArticlePDF Available

Text to Speech Conversion

Authors:

Abstract and Figures

The present paper has introduced an innovative, efficient and real-time cost beneficial technique that enables user to hear the contents of text images instead of reading through them. It combines the concept of Optical Character Recognition (OCR) and Text to Speech Synthesizer (TTS) in Raspberry pi. This kind of system helps visually impaired people to interact with computers effectively through vocal interface. Text Extraction from color images is a challenging task in computer vision. Text-to-Speech conversion is a method that scans and reads English alphabets and numbers that are in the image using OCR technique and changing it to voices. This paper describes the design, implementation and experimental results of the device. This device consists of two modules, image processing module and voice processing module. The device was developed based on Raspberry Pi v2 with 900 MHz processor speed.
Content may be subject to copyright.
*Author for correspondence
Indian Journal of Science and Technology, Vol 9( 3 8) , DO I : 1 0. 1 74 8 5/ i js t /2 0 16 / v 9i 3 8/ 1 02 9 67 , O c to b er 20 1 6
ISSN (Print) : 0974-6846
ISSN (Online) : 0974-5645
Tex t to Sp eech Conver si on
S. Venkateswarlu1*, D. B. K. Kamesh1, J. K. R. Sastry2 and Radhika Rani2
1Department of CSE, K L University, Vaddeswarm, Guntur – 522502, Andhra Pradesh, India; somu23@kluniversity.in,
kameshdbk@kluniversity.in, radhikarani_cse@kluniversity.in
2Department of ECM, K L University, Vaddeswarm, Guntur – 522502, Andhra Pradesh, India;
drsastry@kluniversity.in
Keywords: Image Processing, OCR, Text Extraction, Text-to-speech, Voice Processing
Abstract

     

     




1. Introduction
Optical character Recognition (OCR) is a process that
converts scanned or printed text images1, handwritten
text into editable text for further processing. is paper
has presented a robust approach for text extraction and
converting it to speech. Testing of device was done on
raspberry pi platform. e Raspy is initially connected
to the internet through VLAN. e soware is installed
using command lines. Following steps are to be followed:
1. e rst setup is to download the installation script,
2. Second step is to convert it to executable form and
3. e last step starts the script which does the rest of
the installation work.
Device set up is done as shown in Figure 1. e web-
cam is manually focused towards the text. en, it takes
a picture; a delay of around 7 seconds is provided, which
helps to focus the webcam, if it is accidently defocused.
Aer delay, picture is taken and processed by Raspy to
hear the spoken words of the text through the earphone
or speaker plugged into Raspy through its audio jack.
Figure 1. Block diagram of text to speech conversion.
2. Methodology
Te xt - t o- s p e ec h d ev i c e co n s is t s o f tw o m ai n m o du l e s, t h e
image processing module and voice processing modules.
Indian Journal of Science and TechnologyVol 9 (38) | October 2016 | www.indjst.org
2
Text to Speech Conversion
Image processing module captures image using camera,
converting the image into text. Voice processing mod-
ule changes the text into sound and processes it with
specic physical characteristics so that the sound can be
understood. Figure 2 shows the block diagram of Text-
To-Speech device, 1st block is image processing module,
where OCR converts .jpg to .txt form. 2nd is voice process-
ing module which converts .txt to speech
Figure 2. Block diagram of text-to-speech device.
Figure 2shows the block diagram of Text-To-Speech
device, 1st block is image processing module, where OCR
converts .jpg to .txt form. 2nd is voice processing module
which converts .txt to speech. OCR is important element
in this module. OCR or Optical Character Recognition is
a technology that automatically recognize the character
through the optical mechanism, this technology imitate
the ability of the human senses of sight, where the cam-
era becomes a replacement for eye and image processing
is done in the computer engine as a substitute for the
human brain2. Tesseract OCR is a type of OCR engine
with matrix matching3. e selection of Tesseract engine
is because of its exibility and extensibility of machines
and the fact that many communities are active researchers
to develop this OCR engine and also because Tesseract
OCR can support 149 languages. In this project we are
identifying English alphabets. Before feeding the image to
the OCR, it is converted to a binary image to increase the
recognition accuracy. Image binary conversion is done
by using Imagemagick soware, which is another open
source tool for image manipulation. e output of OCR
is the text, which is stored in a le (speech.txt). Machines
still have defects such as distortion at the edges and dim
light eect, so it is still dicult for most OCR engines to
get high accuracy text4. It needs some supporting and
condition in order to get the minimal defect. Tesseract
OCR Implementation.
2.1 Soware Design
Soware processes the input image and converted into
text format. e soware implementation is showed in
Figure 3.
Figure 3. Soware design of image processing module.
2.2 e Voice Processing Module
In this module text is converted to speech. e out-
put of OCR is the text, which is stored in a le (speech.
txt). Here, Festival soware is used to convert the text to
speech. Festival is an open source Text To Speech (TTS) 7,8
system, which is available in many languages. In this proj-
ect, English TTS 9–11system is used for reading the text.
3. Results
Observed outcome of project:
• Text is extracted from the image and converted
to audio.
• It recognizes both capital as well as small letters.
• It recognizes numbers as well.
• Range of reading distance was 38-42cm.
• Character font size should be minimum 12pt.
• Maximum tilt of the text line is 4-5 degree from
the vertical.
4. Conclusion
Text-to-Speech device can change the text image input
into sound with a performance that is high enough and
a readability tolerance of less than 2%, with the average
time processing less than three minutes for A4 paper size.
is portable device, does not require internet connec-
tion, and can be used independently by people. rough
Indian Journal of Science and Technology
3
Vol 9 (38) | October 2016 | www.indjst.org
S. Venkateswarlu, D. B. K. Kamesh, J. K. R. Sastry and Radhika Rani
this method, we can make editing process of books or web
pages easier.
5. References
1. Archana A, Shinde D. Text pre-processing and text seg-
mentation for OCR. International Journal of Computer
Science Engineering and Technology. 2012:810–12.
2. Mithe R, Indalkar S, Divekar N. Optical character recog-
nition. International Journal of Recent Technology and
Engineering. 2013 Mar; 2(1).
3. Smith R. An overview of the Tesseract OCR engine, USA:
Google Inc; 2007.
4. Shah H, Shah A. Optical character recognition of Gujarati
numerical. International Conference on Signals, Systems
and Automation. 2009; 49–53.
5. Monk S. Raspberry pi cook.
6. Text localization and extraction in images using mathemat-
ical morphology and OCR Techniques; 2013.
7. Vanitha E, Kasarla PK, Kuamarswamy E. Implementation
of text- to-speech for real time embedded system using
Raspberry Pi processor. International Journal and Magazine
of Engineering Technology Management and Research.
2015 Jul:1995.
8. Kumar GS, Krishna MNVLM. Low cost speech recognition
system running on Raspberry Pi to support Automation
applications. International Journal of Engineering Trends
and Technology. 2015; 21(5).
9. Bhargava A, Nath KV, Sachdeva P, Samel M. Reading assis-
tant for visually Impaired. International Journal of current
Engineering and Technology. 2015 Apr; 5(2).
10. Gomes LCT, Nagle EJ, Chiquito JG. Text-to-speech conver-
sion system for Brazilian Portuguese using a formant-based
synthesis technique. LPS-DECOM-FEEC-Unicamp.
11. Sim Liew Fong, Abdelrahman Osman Elfaki, Md Gapar
bin Md Johar & Kevin Loo Tow Aik, Mobile Language
Translator, 5th Malaysian Conference in Soware
Engineering (Misses); 2011.
12. Kamesh DBK, Nazma SK, Sastry JKR, Venkateswarlu S.
Camera based text to speech conversion, obstacle and cur-
rency detection for blind persons. Indian Journal of Science
and Technology. 2016 Aug; 9(30).
... Venkateswarlu et al [11] investigated the efficiency and cost effectiveness of developing a Raspberry Pi based system for text-to-speech translation to assist visually impaired students in their studies. The system is simple to use as the Raspberry Pi is a credit card sized device with full computing capability. ...
Conference Paper
Full-text available
Most blind and visually impaired students in third world countries still use mechanical braillefor their education. With the advancement of technology and the spread of electronic communication,paper-based Braille is not effective and efficient enough. The Raspberry Pi-based Braille keyboard designwith audio output is a low-cost electronic keyboard whose main features are to vocalize Braille characterswritten by a visually impaired student and display them on an LCD screen. Proposed to promote aninteractive educational experience among students, teachers and parents, the Braille printer is affordableand cost-effective with advanced features. The design of the device is simple as it is based on Raspberry Pitechnology. The user hears the output after a short buzzer beep when the character typing process isfinished. gTTS (Google Text-to-Speech) is a Python package and Google Translates text-to-speech API isused to convert text to speech. The data is displayed on an LCD screen for the non-visually impaired(teacher/parent). The Braille keyboard study is designed through the Proteus simulation program. This workfocuses on developing a Braille keyboard for later stages that allows users to use the Braille writing systemto enter text and communicate with digital devices.
... Here each of the characters is matched with its corresponding template and saved as normalized text. The recognized text is then converted into speech [4] through a headset using TTS Engine. For the Object Detection, after capturing images the object is passed to the R-CNN layer. ...
Article
The main aim of our project is to develop a portable raspberry pi implemented gadget for object detection with relative motion and distance. This technology is basically used for conversion of sequence of real time objects into series of text which can be further stored into database and can be utilized to assist visually impaired people and in various security purposes as well. For that purpose, the conversion system is proposed in this project. Our system basically operates in 2 different modes. One is detecting the class of objects nearby with the help of R-CNN network, and the second one is obstacle detection using ultrasonic sensor. It includes 3 buttons for mode selection and the system operates on the basis of mode selection. It includes camera to capture an image as input, and input image is then passed to the R-CNN that recognizes number of objects inside image, their classes and types, text written inside and which is then can be passed to the database for a storage.
Article
As per the survey report released last year on disability by National Statistics Office, it was seen that people with some kind of mental or physical disability is around 2.2% of India’s total population. The project, “Saksham” that suggests be independent, aims to eliminate the need for human assistance and to provide equal opportunities and a more normal way of life to those with language or other disabilities. In the direction of building individual strength and also get great improvement in the academic performance of adults and kids with special needs, Assistive technology is now being used as an innovative tool. The entire application have been developed keeping in mind that it needs to provide all our users with instantaneous access to selected features specially catered to help them in completing their daily tasks.
Article
Over the years, the issue of language differences has made it more difficult to communicate information effectively between countries and efficient information exchange has been hampered by the issue of linguistic differences. The conventional method employed to address linguistic barriers has not proven beneficial or effective. In the present, language interpreters need to be proficient in both the language they are translating into and the original language. The traditional methods of resolving language differences have not proven beneficial or productive. Additionally, language difference issues can make teaching foreign languages challenging. Multilingual Interpreters or Translators play a pivotal role in facilitating effective communication and understanding across languages. In order to simplify language learning and translation and promote stressfree communication, the study creates an Android language converter application, which can work more efficiently with an optimized code for the process of translation. Unlike traditional translation apps, our model leverages advanced natural language processing and machine learning algorithms to provide users with an intuitive and context-aware multilingual interpretation experience. In terms of communication, this application can help tourists integrate with the locals and obtain the necessary information.
Article
This is contrary for Voice impaired people since their speech is tough for others to recognize even by their parents and teachers. Provided if their parents are illiterate. So our TTS system can be used for converting their written text to speech for their illiterate parents and friends around them. Though many methods have been adopted for the concatenation of the basic sound units, the HMM-based approach in modeling the sound is utilized by many researchers in many languages. In this paper, we have tried to implement, text to speech systems of synthesis for a Tamil text uses a phonemic concatenation approach in MATLAB. Instead of utilizing Tamil letters as it is, due to its difficulty in production, Tamil text is transliterated into English then it is converted into intelligible speech. The performance of the output is verified for various examples by changing its parameters, in which the quality of the sound is comparable to that of English text. So the proposed system is utilized for all languages other than Tamil also if it is properly transliterated for limited vocabulary.
Article
Full-text available
Background/Objectives: The main object of this paper is to present an innovated system that can help the blind for handling currency. Methods/Statistical Analysis: Many image processing techniques have been used to scan the currency, remove the noise, mark the region of interest and convert the image into text and then to sound which can be heard by the blind. The entire system is implemented by using Raspberry Pi Micro controller based system. In the proto type model an IPR sensor is used instead of camera for sensing the object. Findings: In this paper a novel method has been presented using which one can recognize the object, mark the interesting region within the object, scan the text and convert the scanned text into binary characters through optical recognition. A second method has been presented using which the noise present in the scanned image is eliminated before characters are recognized. A third method that can be used to convert the recognised characters into e-speech through pattern matching has also be presented. Applications: An embedded system has been developed based on ARM technology which helps the blind persons to read the currency notes. All the methods presented in this paper have been implemented within an embedded application. The embedded board has been tested with different currency notes and the speech in English has been generated that identify the value of the currency. Further work can be done to generate the speech in different other both National and International Languages.
Article
This paper discusses the possible advantages of optical character recognition and matrix mark scanning for commercial organizations. Several available machines are mentioned and possible founts are considered in relation to the needs of Eastern Electricity. The problems of form layout, printing and paper quality are given attention, as well as the overall reject rates and the economics of O.C.R. The paper concludes with a review of actual developments and hoped for developments in optical scanning. This paper was first presented to the Glasgow Branch of the British Computer Society on 4th April, 1966.
Article
Optical Character Recognition (OCR) systems have been effectively developed for the recognition of printed script. The accuracy of OCR system mainly depends on the text preprocessing and segmentation algorithm being used. When the document is scanned it can be placed in any arbitrary angle which would appear on the computer monitor at the same angle. This paper addresses the algorithm for correction of skew angle generated in scanning of the text document and a novel profile based method for segmentation of printed text which separates the text in document image into lines, words and characters.
Conference Paper
The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.
Conference Paper
This paper contains a general description of the text-to-speech conversion system for Brazilian Portuguese developed at the Signal Processing Laboratory of FEEC-Unicamp. The system is fully operational and performs the whole process of text-to-speech conversion, including components for text processing (preprocessing, grammatical classification and phonetic transcription), prosodic processing (generation of intonation patterns and segmental durations) and signal processing (using the Klatt formant synthesizer, with a synthesis-by-rule technique for generating synthesizer control parameters)
Optical character recognition of Gujarati numerical
  • H Shah
  • A Shah
Shah H, Shah A. Optical character recognition of Gujarati numerical. International Conference on Signals, Systems and Automation. 2009; 49-53.
Implementation of text-to-speech for real time embedded system using Raspberry Pi processor
  • E Vanitha
  • P K Kasarla
  • E Kuamarswamy
Vanitha E, Kasarla PK, Kuamarswamy E. Implementation of text-to-speech for real time embedded system using Raspberry Pi processor. International Journal and Magazine of Engineering Technology Management and Research. 2015 Jul:1995.
Reading assistant for visually Impaired
  • A Bhargava
  • K V Nath
  • P Sachdeva
  • M Samel
Bhargava A, Nath KV, Sachdeva P, Samel M. Reading assistant for visually Impaired. International Journal of current Engineering and Technology. 2015 Apr; 5(2).
Md Gapar bin Md Johar & Kevin Loo Tow Aik, Mobile Language Translator
  • Sim Liew Fong
  • Abdelrahman Osman Elfaki
Sim Liew Fong, Abdelrahman Osman Elfaki, Md Gapar bin Md Johar & Kevin Loo Tow Aik, Mobile Language Translator, 5th Malaysian Conference in Software Engineering (Misses); 2011.