Content uploaded by Duvvuri B K Kamesh Duvvuri
Author content
All content in this area was uploaded by Duvvuri B K Kamesh Duvvuri on Sep 23, 2017
Content may be subject to copyright.
*Author for correspondence
Indian Journal of Science and Technology, Vol 9( 3 8) , DO I : 1 0. 1 74 8 5/ i js t /2 0 16 / v 9i 3 8/ 1 02 9 67 , O c to b er 20 1 6
ISSN (Print) : 0974-6846
ISSN (Online) : 0974-5645
Tex t to Sp eech Conver si on
S. Venkateswarlu1*, D. B. K. Kamesh1, J. K. R. Sastry2 and Radhika Rani2
1Department of CSE, K L University, Vaddeswarm, Guntur – 522502, Andhra Pradesh, India; somu23@kluniversity.in,
kameshdbk@kluniversity.in, radhikarani_cse@kluniversity.in
2Department of ECM, K L University, Vaddeswarm, Guntur – 522502, Andhra Pradesh, India;
drsastry@kluniversity.in
Keywords: Image Processing, OCR, Text Extraction, Text-to-speech, Voice Processing
Abstract
1. Introduction
Optical character Recognition (OCR) is a process that
converts scanned or printed text images1, handwritten
text into editable text for further processing. is paper
has presented a robust approach for text extraction and
converting it to speech. Testing of device was done on
raspberry pi platform. e Raspy is initially connected
to the internet through VLAN. e soware is installed
using command lines. Following steps are to be followed:
1. e rst setup is to download the installation script,
2. Second step is to convert it to executable form and
3. e last step starts the script which does the rest of
the installation work.
Device set up is done as shown in Figure 1. e web-
cam is manually focused towards the text. en, it takes
a picture; a delay of around 7 seconds is provided, which
helps to focus the webcam, if it is accidently defocused.
Aer delay, picture is taken and processed by Raspy to
hear the spoken words of the text through the earphone
or speaker plugged into Raspy through its audio jack.
Figure 1. Block diagram of text to speech conversion.
2. Methodology
Te xt - t o- s p e ec h d ev i c e co n s is t s o f tw o m ai n m o du l e s, t h e
image processing module and voice processing modules.
Indian Journal of Science and TechnologyVol 9 (38) | October 2016 | www.indjst.org
2
Text to Speech Conversion
Image processing module captures image using camera,
converting the image into text. Voice processing mod-
ule changes the text into sound and processes it with
specic physical characteristics so that the sound can be
understood. Figure 2 shows the block diagram of Text-
To-Speech device, 1st block is image processing module,
where OCR converts .jpg to .txt form. 2nd is voice process-
ing module which converts .txt to speech
Figure 2. Block diagram of text-to-speech device.
Figure 2shows the block diagram of Text-To-Speech
device, 1st block is image processing module, where OCR
converts .jpg to .txt form. 2nd is voice processing module
which converts .txt to speech. OCR is important element
in this module. OCR or Optical Character Recognition is
a technology that automatically recognize the character
through the optical mechanism, this technology imitate
the ability of the human senses of sight, where the cam-
era becomes a replacement for eye and image processing
is done in the computer engine as a substitute for the
human brain2. Tesseract OCR is a type of OCR engine
with matrix matching3. e selection of Tesseract engine
is because of its exibility and extensibility of machines
and the fact that many communities are active researchers
to develop this OCR engine and also because Tesseract
OCR can support 149 languages. In this project we are
identifying English alphabets. Before feeding the image to
the OCR, it is converted to a binary image to increase the
recognition accuracy. Image binary conversion is done
by using Imagemagick soware, which is another open
source tool for image manipulation. e output of OCR
is the text, which is stored in a le (speech.txt). Machines
still have defects such as distortion at the edges and dim
light eect, so it is still dicult for most OCR engines to
get high accuracy text4. It needs some supporting and
condition in order to get the minimal defect. Tesseract
OCR Implementation.
2.1 Soware Design
Soware processes the input image and converted into
text format. e soware implementation is showed in
Figure 3.
Figure 3. Soware design of image processing module.
2.2 e Voice Processing Module
In this module text is converted to speech. e out-
put of OCR is the text, which is stored in a le (speech.
txt). Here, Festival soware is used to convert the text to
speech. Festival is an open source Text To Speech (TTS) 7,8
system, which is available in many languages. In this proj-
ect, English TTS 9–11system is used for reading the text.
3. Results
Observed outcome of project:
• Text is extracted from the image and converted
to audio.
• It recognizes both capital as well as small letters.
• It recognizes numbers as well.
• Range of reading distance was 38-42cm.
• Character font size should be minimum 12pt.
• Maximum tilt of the text line is 4-5 degree from
the vertical.
4. Conclusion
Text-to-Speech device can change the text image input
into sound with a performance that is high enough and
a readability tolerance of less than 2%, with the average
time processing less than three minutes for A4 paper size.
is portable device, does not require internet connec-
tion, and can be used independently by people. rough
Indian Journal of Science and Technology
3
Vol 9 (38) | October 2016 | www.indjst.org
S. Venkateswarlu, D. B. K. Kamesh, J. K. R. Sastry and Radhika Rani
this method, we can make editing process of books or web
pages easier.
5. References
1. Archana A, Shinde D. Text pre-processing and text seg-
mentation for OCR. International Journal of Computer
Science Engineering and Technology. 2012:810–12.
2. Mithe R, Indalkar S, Divekar N. Optical character recog-
nition. International Journal of Recent Technology and
Engineering. 2013 Mar; 2(1).
3. Smith R. An overview of the Tesseract OCR engine, USA:
Google Inc; 2007.
4. Shah H, Shah A. Optical character recognition of Gujarati
numerical. International Conference on Signals, Systems
and Automation. 2009; 49–53.
5. Monk S. Raspberry pi cook.
6. Text localization and extraction in images using mathemat-
ical morphology and OCR Techniques; 2013.
7. Vanitha E, Kasarla PK, Kuamarswamy E. Implementation
of text- to-speech for real time embedded system using
Raspberry Pi processor. International Journal and Magazine
of Engineering Technology Management and Research.
2015 Jul:1995.
8. Kumar GS, Krishna MNVLM. Low cost speech recognition
system running on Raspberry Pi to support Automation
applications. International Journal of Engineering Trends
and Technology. 2015; 21(5).
9. Bhargava A, Nath KV, Sachdeva P, Samel M. Reading assis-
tant for visually Impaired. International Journal of current
Engineering and Technology. 2015 Apr; 5(2).
10. Gomes LCT, Nagle EJ, Chiquito JG. Text-to-speech conver-
sion system for Brazilian Portuguese using a formant-based
synthesis technique. LPS-DECOM-FEEC-Unicamp.
11. Sim Liew Fong, Abdelrahman Osman Elfaki, Md Gapar
bin Md Johar & Kevin Loo Tow Aik, Mobile Language
Translator, 5th Malaysian Conference in Soware
Engineering (Misses); 2011.
12. Kamesh DBK, Nazma SK, Sastry JKR, Venkateswarlu S.
Camera based text to speech conversion, obstacle and cur-
rency detection for blind persons. Indian Journal of Science
and Technology. 2016 Aug; 9(30).