Figure 2 - uploaded by Duvvuri B K Kamesh Duvvuri
Content may be subject to copyright.
Block diagram of text-to-speech device. 

Block diagram of text-to-speech device. 

Source publication
Article
Full-text available
The present paper has introduced an innovative, efficient and real-time cost beneficial technique that enables user to hear the contents of text images instead of reading through them. It combines the concept of Optical Character Recognition (OCR) and Text to Speech Synthesizer (TTS) in Raspberry pi. This kind of system helps visually impaired peop...

Similar publications

Thesis
Full-text available
Image and speech processing is one of the trending research areas in machine learning that contributes immensely to the field of artificial intelligence. It enhances raw images received from gadgets such as camera or a mobile phone in normal day-today life for various applications. Conversion of images to text as well as speech can be of great bene...
Article
Full-text available
Background Following neoadjuvant chemotherapy, surgical resection is one of the most preferred treatment options for locally advanced gastric cancer patients. However, the optimal time interval between chemotherapy and surgery is unclear. This review aimed to identify the optimal time interval between neoadjuvant chemotherapy and surgery for advanc...
Article
Full-text available
Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized high-quality speech. However, TTS speech intelligibility degrades in noisy environments because most of these systems were not designed to handle noisy environments. Several works attempted to address this problem by using offline fine-tuning to adapt their TTS t...

Citations

... Venkateswarlu et al [11] investigated the efficiency and cost effectiveness of developing a Raspberry Pi based system for text-to-speech translation to assist visually impaired students in their studies. The system is simple to use as the Raspberry Pi is a credit card sized device with full computing capability. ...
Conference Paper
Full-text available
Most blind and visually impaired students in third world countries still use mechanical braillefor their education. With the advancement of technology and the spread of electronic communication,paper-based Braille is not effective and efficient enough. The Raspberry Pi-based Braille keyboard designwith audio output is a low-cost electronic keyboard whose main features are to vocalize Braille characterswritten by a visually impaired student and display them on an LCD screen. Proposed to promote aninteractive educational experience among students, teachers and parents, the Braille printer is affordableand cost-effective with advanced features. The design of the device is simple as it is based on Raspberry Pitechnology. The user hears the output after a short buzzer beep when the character typing process isfinished. gTTS (Google Text-to-Speech) is a Python package and Google Translates text-to-speech API isused to convert text to speech. The data is displayed on an LCD screen for the non-visually impaired(teacher/parent). The Braille keyboard study is designed through the Proteus simulation program. This workfocuses on developing a Braille keyboard for later stages that allows users to use the Braille writing systemto enter text and communicate with digital devices.
... Here each of the characters is matched with its corresponding template and saved as normalized text. The recognized text is then converted into speech [4] through a headset using TTS Engine. For the Object Detection, after capturing images the object is passed to the R-CNN layer. ...
Article
The main aim of our project is to develop a portable raspberry pi implemented gadget for object detection with relative motion and distance. This technology is basically used for conversion of sequence of real time objects into series of text which can be further stored into database and can be utilized to assist visually impaired people and in various security purposes as well. For that purpose, the conversion system is proposed in this project. Our system basically operates in 2 different modes. One is detecting the class of objects nearby with the help of R-CNN network, and the second one is obstacle detection using ultrasonic sensor. It includes 3 buttons for mode selection and the system operates on the basis of mode selection. It includes camera to capture an image as input, and input image is then passed to the R-CNN that recognizes number of objects inside image, their classes and types, text written inside and which is then can be passed to the database for a storage.
... Using machine learning technologies, speech synthesis in TTS has supported the artificial rendering of human-like speech computer systems. Converting the text, limiting natural language sentences to be the verbal form of the identical text by a native speaker of the language is the main purpose of text-to-speech synthesis [8,111]. As newer applications are growing up in the market, the need for speech synthesis has fueled up. ...
Article
Full-text available
Text-to-speech systems (TTS) have come a long way in the last decade and are now a popular research topic for creating various human-computer interaction systems. Although, a range of speech synthesis models for various languages with several motive applications is available based on domain requirements. However, recent developments in speech synthesis have primarily attributed to deep learning-based techniques that have improved a variety of application scenarios, including intelligent speech interaction, chatbots, and conversational artificial intelligence (AI). Text-to-speech systems are discussed in this survey article as an active topic of study that has achieved significant progress in the recent decade, particularly for Indian and non-Indian languages. Furthermore, the study also covers the lifecycle of text-to-speech systems as well as developed platforms in it. We performed an efficient search for published survey articles up to May 2021 in the web of science, PubMed, Scopus, EBSCO(Elton B. Stephens CO (company)) and Google Scholar for Text-to-speech Systems (TTS) in various languages based on different approaches. This survey article offers a study of the contributions made by various researchers in Indian and non-Indian language text-to-speech systems and the techniques used to implement it with associated challenges in designing TTS systems. The work also compared different language text-to-speech systems based on the quality metrics such as recognition rate, accuracy, TTS score, precision, recall, and F1-score. Further, the study summarizes existing ideas and their shortcomings, emphasizing the scope of future research in Indian and non-Indian languages TTS, which may assist beginners in designing robust TTS systems.
... So, it leads to an inaccessible state. [10][11][12][13] suggested that by utilizing an attributes-based approach that leads to a low dimensional, it has a fine-tuned-length representation and cumulated representation of word images and strings, so it is expeditious to compare and compute. It allows us to perform queries indistinctly like a query-by string. ...
Article
Full-text available
The lack of Braille resources in this advanced world has tied the hands of visually impaired people from soaring up. This paper takes those guests into concern and presents a solution that helps every individual especially the blinds in reading books and text from real time images. This solution converts the text obtained from text document and real world entities to aural output which lends a hand in reading text. The main idea is to build a software using a novel methodology where OCR engine receives input images and get converted into intermediate textual output that is given to Google Trans to get the audio output via earphones.
... Technology is developing very fast and making things easier (Arlinwibowo, Retnawati, Kartowagiran, & Kassymova, 2020;Hapsari et al., 2018). With technology, text information can be converted into voice so that it can be accessed by visually impaired people (Edward et al., 2018;Shetake et al., 2014;Venkateswarlu et al., 2016). However, changing text to audio cannot be implemented immediately because visually impaired people have different orientations (Pring, 2008;Salisbury, 2008) and experiences (Pandey, 2018). ...
Article
Full-text available
This study aims to develop an Android-based math exercises application for the visually impaired. This research is development research carried out with research steps, namely: (1) preliminary research, (2) prototyping stage, and (3) assessment phase. The research was conducted between April 2020 and December 2020. The material chosen in the application developed was a plan taught in 8 grade. The research process involved six experts in assessing the product, namely three mathematics education experts to assess the validity of the aspects of mathematical content, two blind education experts to assess visually impaired content suitability and accessibility, and 1 IT expert to assess product performance. The product was tested on nine visually impaired. The quality of teaching materials is based on three basic aspects: feasibility, practicality, and effectiveness. The conclusions of this study are: (1) the product has good quality because it has been declared feasible by experts, practical, which can be seen from the enthusiastic response and student testimonials, and is effective because it can be used to learn and measure abilities, (2) the application is divided into three sections, preamble (contains the opening tune and instructions for use), practice questions, and results. Application development is based on two elements, namely accessibility and compatibility of the content with the cognition of the visually impaired, (3) the question page consists of questions (will be read when entering the page and can be repeated when the user taps the question section), under the question, there is a question number. There is a question; answer choices are arranged twice in two (the answer choices will be read out when pressed by the user). There is an answer lock button at the very bottom, and (4) the visually impaired want an application that has a simple operating system, provides challenges to the user and has two functions, namely measuring their abilities and facilitating their learning.
... The whole work is implemented by using Raspberry Pi which will help to read the text and converts it in to audio form. Several research works has been carried out to detect the character by using optical character recognition and the detected text has been converted to text by using different text to speech converter [12][13][14][15][16]. ...
... In our planet of 6.5 billion humans, 265 million square measure visually impaired out of whom thirty-nine million folks square measure completely blind, i.e. haven't any vision in the least, and 225 million have delicate or severe vision defect (WHO, 2010). It's been expected that by the the period 2022, these numbers can move on to seventy five hundreds blind and two hundred million people with vision defect [7]. ...
Article
Full-text available
The idea presented on this paper is proposed for an application of the OCR. It acts as a life saver for the visually challenged people. The feature of this system is its ability to capture the image of a real world environment using a camera and recognize the characters present in that image being captured. This setup is constructed using OpenCv. The identified characters are converted into an audio file output which will be helpful for visually challenged people. The characters that have been identified in the representation captured from the device is converted to the string using Tesseract. The string is translated to articulate sound using printed form to articulate sound (TTS) module. Important feature of this OCR system is that this whole system is made portable in a way; it acts as an artificial vision for the blind through the audio output generated on the system.
... Automatic Speech Recognition (ASR) is the processing by which computers transform a speech signal into a sequence of words and phrases [11], known as speech to text conversion. Figure 3 presents the basic architecture of a speech recognition system where phonemes are the basic units of acoustic information in the extraction of characteristics. ...
... IBM calls this training as a skill where a conversational model is established, consisting of entities, intentions, and dialogues. a) Intents: The intents as mentioned [11] represent the purpose of a user's entry. Where an Intent is defined for each type of user request that the application is required to support. ...
... The image processing module captures the image and converts into text and the voice processing module converts the text into audio. To overcome this limitation convolutional neural network is proposed in this paper for reading the medicine names [15][16][17][18][19]. In section II methodology is discussed followed by simulation results in section III. ...
... Lip tracking is done based on the references of the previous data and so lip tracking is much more reliable and requires less resource than lip finding. The Active Appearance Model (AAM) is employed to extract location of specific points of face from every frame of video sequence [5]. The shape information extracted by the AAM from the face image is used to compute a set of suitable parameters that describe the appearance of facial features. ...