Conference PaperPDF Available

Construction of a voice driven life assistant system for visually impaired people

Authors:
Construction of a Voice Driven Life Assistant System for Visually Impaired People
Runze Chen, Zhanhong Tian, Hailun Liu, Fang Zhao, Shuai Zhang, Haobo Liu
School of Software Engineering
Beijing University of Posts and Telecommunications
Beijing, China
e-mail: chenrz925@bupt.edu.cn
AbstractThe rapid development of artificial intelligence and
mobile computation brings more convenient life to the blind
and visually impaired people. This paper presents a prototype
of a voice assistant specially designed for them. The system
mainly contains fundamental services including falling
detection, safety care, accessibility of mobile phone, daily
information broadcasting and view description to make life
easier for them. Natural language understanding, voice
recognition and synthesis have been integrated to enable users
operate majority of mobile phones' functions. Also, the built-in
falling detection algorithm based on tri-axis accelerometer and
object detection algorithm based on Mask R-CNN can enrich
sense of users and at the same time keep the safety of users.
Keywords-voice assistant; navigation; visually impaired;
natural language understanding; accessibility; mobile computing
I. INTRODUCTION
With the rapid development of artificial intelligence and
mobile computing, modern technology has brought more
convenient to the blind and visually impaired people. Lately,
it's estimated that there are about 253 [1] million people with
vision impairment. The group of visually impaired persons
are undergoing an inconvenient daily life without the
assistant from their family or friends. An effective method
for guidance is using guide dogs. However, the disadvantage
of this method is that guide dogs need plenty of money and
time to train and feed it. Also, the blind need an approach to
get to know about the life outside their home and they desire
to have access to internet and mobile services as normal one.
However, as for them, a lot of obstacles exist and there
remains need improvement from society and technology,
including lack of information resources for blind, inadequate
infrastructure and lack of technical input. [2] But, according
to our research, the rapid development of artificial
intelligence and mobile computing technology can be an
ideal solution to help the blind and visually impaired persons
to perceive their surroundings.
To assist blind and visually impaired group, many
solutions has been provided. Some solutions try to design a
hardware system to provide some fundamental functions. For
example, Mohamed Manoufali [3] et al. designed cane for
blind with obstacle detection by ultrasonic sensor. And Siti
Fauziah Toha [4] et al. also has the similar idea about
assistance. However, those solutions can’t detect the objects
around the user. Another type of solution for blind assistance
is to provide guide and service to blind users. Jiayin S. [5] et
al. provides a construct of a guide device with some
fundamental functions and the user need to press buttons to
use those services but the user experience and functions are
limited.
In this paper, we presented "beEYE", an extendable
system launched on Android phone to provide functions for
visually impaired people. It has functions including
messaging, describing the street view, navigating to certain
place, etc. We have integrated those discrete function into a
unified system with a voice interface provided to the blind.
With our system, we hope to greatly improve their life.
II. RELATED WORK
There are many solutions to simplify the way people
interacting with computers. The stability of natural language
understanding and voice recognition have developed so well
that blind person can also has the chance to use the mobile
phone easily. To understand the intention of user and extract
key information in the sentences spoken, natural language
understanding technology should classify the intent and its
content so as to extract the entities from the raw sentence.
Microsoft has released LUIS [6], a natural language
understanding service which can extract the intent and
entities from the sentence. An open-source project named
Rasa NLU [7] can also provide support to classify the intent
and extract the entities. However, Rasa NLU need to be
modified to understand Chinese text.
A falling detection system has been designed by Wang
Rong [8] et al. has provide a solution to detect elder people's
movement. Also, as a risk warning service, the falling
detection system can be used to protect blind and visually
impaired people and alarm their family when abnormal event
happens. Kaiming He [9] created a method to detect objects
in images which extends Fast R-CNN named Mask R-CNN
[10]. The object detection technology can help blind group to
know what appears in front of their walking direction well,
so we also integrated the Mask R-CNN algorithm into the
system to describe the view in front of the blind user.
III. SYSTEM ARCHITECTURE
We have unified different approach into an unique one
with a voice interface so that users just need to speak to the
platform to get service. The service provided by the system
contains 5 modules. To run this system, an Android device
with GPS system, three-axis gyroscope and internet
connection is needed. Optionally, a Bluetooth headphone is
87
,QWHUQDWLRQDO&RQIHUHQFHRQ$UWLILFLDO,QWHOOLJHQFHDQG%LJ'DWD
978-1-5386-6987-7/18/$31.00 ©2018 IEEE
needed to provide a better experience to users. The total
construct of modules in this study is displayed in Fig. 1.
Figure 1. Module diagram of the project.
When a user speaks out the command or query to the
device, the application will connect to the back-end server
and the back-end server will provide a RESTful [11] service
to define which module to access judged from the natural
language command. The redirected command will
authenticate the user and provide demanded information or
invoke the local modules to compute or react to the
physically action of user. Some safety-related module will
run independently to sense the abnormal movement from
blind users. When user has an abnormal action, the platform
can immediately detect and notify the guard user of the blind
user with the location and latest walking trails. The vision
description module need to collect the data from sensors and
need the computation of algorithms deployed on server. Fig.
2 shows deployment of the project.
Figure 2. Deployment diagram of the project.
A. Dialogue System
The technology of Chinese speech recognition has been
well developed by iFLYTEK [12], so we integrated it into
the platform to get the raw natural Chinese sentence from
users' voice. However, the dictionary of voice recognition
needs to be extended because the customized words such as
names in contacts are also needed to be recognized to
support the services of accessibility. For example, a wrong
homophonic name can’t be searched out from the contact
database. To solve this problem, we locally extended the
dictionary of the voice recognition module according to
limited user data to get a better accuracy.
Another important problem need to be solved is to
understand the intent of visually impaired users and get the
key information from the expression parsed from the human
voice. To classify the intent of user from commands, this
study trained a model based on the LUIS [6] with well
accuracy and stability to understand natural language. For
example, if user speak "I want to walk to Beijing University
of Posts and Telecommunications", the model can parse this
command to an intent "Navigation" and entities of "Beijing
University of Posts and Telecommunications" with the type
"Location" as well as the transport entity "Walking". With
the intent and entities, the navigation service can search and
plan the route for this query in order to navigate user to the
certain place.
Responding to the user with a human voice is the output
of the system, when user gives command to the system to
execute different tasks frequently, the response will be
produced before the voice output and the latest response will
yield the last one speaking response without any measures.
To deal with this problem, an asynchronous queue is placed
as a buffer for different responses, which can also reduce
coupling of voice interface and diverse services. To make
sure the request completely received, the recorder can be
wake up by user's touch on the whole screen and notify the
user by vibrator when record action starts. To explain the
process of voice interface, Fig. 3 shows the process of
dealing with human voice.
Figure 3. Flowchart diagram of voice interface module.
88
B. Navigation & Security
The project provides solutions of walking routine and
public transport for transfer for blind users. When the user
gives the command to the platform for navigating to certain
place, the platform will schedule the best routine and start
backend navigation service. With the navigation service,
users can know when to change direction and how long the
distance they need to walk. AMap [13] service is integrated
to provide the key data for navigation so that the platform
can access more completed routine information for users.
When user need to know where they are, they can ask to the
platform where they are and the platform can also provide an
accurate answer. To avoid unexpected event, the navigation
service will also judge whether user walks toward the right
direction by the built-in compass of the device as a
supplement of blind cane.
Related to the location and movement information,
security is also important for users. This study implements an
algorithm computing backend to judge the state of user [8],
when unexpected fall happens the system can immediately
notify to the guard user by messaging the location and
dialing. With the real-time accessible from tri-axis
accelerometer, the algorithm can provide an attitude
estimation of user to make sure the safety of user. We can
access attitude angle from the built-in tri-axis accelerometer
of devices. To determine attitude angle, the following
equations can be applied.
ℎ =arctan(
2+2)
 =arctan(
2+2)
 =arctan(2+2
)
In above equation, pitch represents the angle of rotation
around the Y axis, which is the angle of the body's backward
pitch; roll represents around the X axis rotation Angle, which
is the body side-slip angle from left to right and yaw presents
the rotation angle around X axis which is the rotation angle
of body from left to right. And the study can collect the data
of pitch, roll and yaw, which can be used to train and analyze
the normal range of user's movement. To avoid the noise
from data, the Kalman filtering algorithm is used to improve
the system's reliability.
To better ensure the safety of user, we collect some extra
data from different sensor including angular velocity(from
gyroscope) and acceleration(from acceleration sensor) and
improve this method. With these data, we can calculate the
acceleration vector sum (AVS) and the angular velocity
vector sum (AVVS) to detect the movement of user using
equation (4) and equation (5).
AVS = 2+2+2
AVVS = 2+2+2
With several sets of data collected consecutively, the
system will give an alarm that a falling incident is happened
when half of the data sets exceed the threshold. If almost all
of the data sets exceed the threshold, the system will give a
notification, asking if the user needs to stop the alarm in five
seconds.
C. Accessibility
The accessibility module provides a unified approach to
access the function of mobile phone. The platform currently
provides the following features displayed in Fig. 4.
Figure 4. Contents of accessibility service.
To provide above features, we developed a solution
depends on the accessibility service of Android operating
system and the platform will work with applications
provided by other companies such as WeChat.
D. Information Service
Figure 5. Contents of information service.
The platform also provides diversity of information
services to facilitate the access to life information. We
integrated some information API to access certain
information and developed some built-in modules to provide
different information for user to query. Currently, the project
89
provides the following information services displayed in Fig.
5.
E. Vision Description
Vision description can help user to know more about
circumstances forward. To some degree, it can also assist
blind users to avoid some danger such as collision with
pedestrian or other facilities on sidewalk. We implemented
and integrated Mask R-CNN [10] to recognize the objects
captured by the device camera. Training Mask R-CNN with
cityscapes [14] dataset, the vision description module can
describe certain objects captured in camera well, such as
bicycles, cars or pedestrians.
The vision description service can start when user ask
about what things are in front of them self. The system can
automatically capture the image of their foreground and send
it to the back-end server provided by us to compute for the
result. When getting the result of Mask R-CNN [10] model,
the module will list all the objects detected and speak out all
of objects.
IV. TEST RESULTS
During the development process of the prototype of the
system, we have tested all the functions and features in the
platform.
Figure 6. Walking route of volunteer.
We invited volunteers to experience and test the system.
The volunteers are demanded to wear a blinder to experience
the real condition of blind or visually impaired users. And
we tested in the campus of Beijing University of Posts and
Telecommunications. The volunteer has walked along the
route in Fig. 6. When walking alone the route calculated by
navigation service, the volunteer can successfully start and
yield the navigation and the navigation service can
immediately reroute when the direction has been suddenly
changed.
During the navigation process, the volunteer command
the system to describe the foreground. After few seconds of
delay, the server can respond the objects detected objects
captured by camera. Because of camera's instability, the
volunteer need to steady the phone to capture clearer images.
We tested this service in conditions of street and classroom
and the algorithm can provide a well result. The Fig. 7
displayed the effect of object detection algorithm.
We deploy the object detection service on a server with
NVIDIA TESLA P100 GPU and evaluated the performance
of the service when it is accessed via the web API provided
by us. And We have collected about 100 images randomly
captured by volunteers to evaluate the performance of the
object detection. We perform the test in different Internet
environment and the Internet connection is stable and
available. In Table I, the test result has been recorded.
TABLE I. COMPUTING TIME OF OBJECT DETECTION IN DIFFERENT
INTERNET ENVIRONMENT
Internet Connection
Average
Computing Time
(ms)
Average Server
Computing Time (ms)
Indoor Wi-Fi
1582
1276
4G (CMCC)
1707
1310
4G (ChinaNet)
1910
1315
Table I displays the time needed to parse the result from
captured images. We can infer from the test result that the
internet connection is not the key factor to take more than 1
second of the object detecting process, and a carefully
simplified model without losing much of the object detection
accuracy can help improve the efficiency. The average
computing time including the time consumed by
communication via Internet and the running of backend
object detection algorithm and the average server computing
time including merely the time consumed by the backend
algorithm.
The recognition of commands has been also tested by the
team. To evaluate performance of dialogue system, we
recorded the accuracy of conversations. In Table II, we can
see that the whole dialogue system can recognize and extract
the intent and entities from users' voice with a well accuracy.
Figure 7. Effect of object detection algorithm.
However, the accuracy of entity recognition can still be
improved if a larger data set of natural sentences is available.
And we are still working hard to enlarge the sentence dataset
to improve the accuracy.
90
TABLE II. ACCURACY OF TESTING VOICE INTERFACE
Function
Intent Accuracy
Weather
98.9%
Navigation
95.3%
Joke
99.5%
Message
97.6%
Dial
96.7%
Location
100.0%
Figure 8 . AVS and AVVS when wa lkin g normally.
.
Figure 9 . AVS and AVVS when walking upstair.
Figure 10. AVS and AVVS when puting the mobile phone into trouser
pocket.
Figure 11. AVS and AVVS when falling backward.
The falling detection service has also been tested under
different conditions. Some test conditions are displayed in
Figure 8, Figure 9, Figure 10 and Figure 11. As displayed in
the following figures, the AVS and AVVS can be used to get
the movement of device and can also help the system
determine the state of the user when the sensor data has
different values larger than the threshold value. And it can be
noticed that the change is smaller than the threshold value
when the user is walking normally and the change can be
dramatic when a falling incident is happened. When testing
the algorithm, we set the threshold value of AVS and AVVS
to 25 and the test can produce a reasonable result.
V. CONCLUSION
This paper implemented a prototype of a voice assistant
to provide daily service to blind and visually impaired
persons. The overall architecture of the platform is displayed
in Fig. 12. As a supplement of blind cane, the system can
help blind people using mobile information service easily
and partially keep the security of users. However, the dream
to help blind persons to walk without cane or guide dogs
needs more work to chase. With this prototype of voice
assistant, hopefully, blind and visually impaired persons can
enjoy a more convenient daily life.
Figure 12. Conceptual figure of the project.
The system can provide a complete outdoor navigation
service but it has some issues just as lack of indoor
navigation service. Some solutions exist, but those solutions
still need hardware components to provide key data to
outdoor navigation algorithms. Security functions need more
development and the system need more hardware devices to
integrate sensors with higher accuracy.
The system also has some limitations on functions and
features for accessibility service and information service.
More information can be collected to integrated into the
91
system in future work. The performance of the prototype has
potential to be improved and algorithms in some module, for
example, natural language understanding, objects detection
and obstacle detection. More algorithms can be tested and
researched to have a better performance and accuracy to
provide a better result on functions of the project.
ACKNOWLEDGMENT
Research Innovation Fund for College Students of
Beijing University of Posts and Telecommunications
REFERENCES
[1] World Health Organization. (2017). "Vision impairment and
blindness." Retrieved March 16, 2018, from
http://www.who.int/mediacentre/factsheets/fs282/en/.
[2] Ying, Z. and G. Chaobing (2014). "Research on Obstacles of
Information Acquisition for the Blind in China." Journal of Modern
Information(07): 10-13.
[3] Manoufali, M., et al. (2011). Smart guide for blind people. 2011
International Conference and Workshop on the Current Trends in
Information Technology, CTIT'11, October 26, 2011 - October 27,
2011, Dubai, United arab emirates, IEEE Computer Society.
[4] Mutiara, G. A., et al. (2016). Smart guide extension for blind cane.
4th International Conference on Information and Communication
Technology, ICoICT 2016, May 25, 2016 - May 27, 2016, Bandung,
Indonesia, Institute of Electrical and Electronics Engineers Inc.
[5] Song, J., et al. (2016). "The design of a guide device with multi -
function to aid travel for blind person." International Journal of Smart
Home 10(4): 77-86.
[6] Microsoft (2018). "LUIS." Retrieved March 16, 2018, from
https://www.luis.ai/.
[7] Rasa Technologies GmbH. (2018). "Rasa NLU." Retrieved March 16,
2018, from https://nlu.rasa.ai/.
[8] Rong, W., et al. (2012). "Design and implementation of fall detection
system using tri-axis accelerometer." Journal of Computer
Applications(05): 1450-1452+1456.
[9] Girshick, R. (2015). Fast R-CNN. 15th IEEE International
Conference on Computer Vision, ICCV 2015, December 11, 2015 -
December 18, 2015, Santiago, Chile, Institute of Electrical and
Electronics Engineers Inc.
[10] He, K., et al. (2017). Mask R-CNN. 16th IEEE International
Conference on Computer Vision, ICCV 2017, October 22, 2017 -
October 29, 2017, Venice, Italy, Institute of Electrical and Electronics
Engineers Inc.
[11] Fielding, R. T. (2000). Architectural styles and the design of network-
based software architectures, University of California, Irvine: xvii,
162 leaves.
[12] iFLYTEK. "iFLYTEK." Retrieved March 15, 2018, from
http://www.xfyun.cn/.
[13] AMap. "AMap." Retrieved March 18, 2018, from
http://lbs.amap.com/.
[14] Cordts, M., et al. (2016). The Cityscapes Dataset for Semantic Urban
Scene Understanding. 29th IEEE Conference on Computer Vision
and Pattern Recognition, CVPR 2016, June 26, 2016 - July 1, 2016,
Las Vegas, NV, United states, IEEE Computer Society.
92
... Since most studies in the EUI/UX theme were conducted to improve the user interface, only user interaction data were collected and analyzed by using the AS API. Only ive studies collected context sensing data types along with interaction sensing data types [28,104,130,147,148]. Table 10, the average number of data categories used per the surveyed study of PS theme was 2.05 which means most studies of PS theme used only a few interaction data types. ...
Preprint
The transfer fees of sports players have become astronomical. This is because bringing players of great future value to the club is essential for their survival. We present a case study on the key factors affecting the world's top soccer players' transfer fees based on the FIFA data analysis. To predict each player's market value, we propose an improved LightGBM model by optimizing its hyperparameter using a Tree-structured Parzen Estimator (TPE) algorithm. We identify prominent features by the SHapley Additive exPlanations (SHAP) algorithm. The proposed method has been compared against the baseline regression models (e.g., linear regression, lasso, elastic net, kernel ridge regression) and gradient boosting model without hyperparameter optimization. The optimized LightGBM model showed an excellent accuracy of approximately 3.8, 1.4, and 1.8 times on average compared to the regression baseline models, GBDT, and LightGBM model in terms of RMSE. Our model offers interpretability in deciding what attributes football clubs should consider in recruiting players in the future.
... Since most studies in the EUI/UX theme were conducted to improve the user interface, only user interaction data were collected and analyzed by using the AS API. Only ive studies collected context sensing data types along with interaction sensing data types [28,104,130,147,148]. Table 10, the average number of data categories used per the surveyed study of PS theme was 2.05 which means most studies of PS theme used only a few interaction data types. ...
Article
Recent industrial and academic research has focused on data-driven analytics with smartphones by collecting user interaction, context, and device systems data through Application Programming Interfaces (APIs) and sensors. The Android OS provides various APIs to collect such mobile usage and sensor data for third-party developers. Usage Statistics API (US API) and Accessibility Service API (AS API) are representative Android APIs for collecting app usage data and are used for various research purposes as they can collect fine-grained interaction data (e.g., app usage history, user interaction type). Furthermore, other sensor APIs help to collect a user’s context and device state data, along with AS/US APIs. This review investigates mobile usage and sensor data-driven research using AS/US APIs, by categorizing the research purposes and the data types. In this paper, the surveyed studies are classified as follows: five themes and 21 subthemes, and a four-layer hierarchical data classification structure. This allows us to identify a data usage trend and derive insight into data collection according to research purposes. Several limitations and future research directions of mobile usage and sensor data-driven analytics research are discussed, including the impact of changes in the Android API versions on research, the privacy and data quality issues, and the mitigation of reproducibility risks with standardized data typology.
Article
One of the most popular methods for people to interact with one another is via email. Today, a significant quantity of urgent and confidential information is sent via email. People are getting closer to a digital existence and a digital communication as technology advances. In this new advanced era, there are many methods to communicate with others online. The majority of them choose email conversation as the most effective form of communication. (Email). Email is the technology that enables users to communicate with others by writing emails and also aids in business-related communication. Their square measure, however, is unable to use these technologies because they are illiterate or unable to look at a computer. So, writers developed this technology to be more accessible to people who are blind.
Chapter
People with special needs like blind and visually impaired (BVI) people can particularly benefit from using voice assistants providing spoken information input and output in everyday life. However, it is crucial to understand their needs and include them in developing accessible and useful assistance systems. By conducting an online survey with 146 BVI people, this paper revealed that common voice assistants like Apple’s Siri or Amazon’s Alexa are used by a majority of BVI people and are also considered helpful. In particular, features in audio entertainment, internet access, and everyday life practical things like weather queries, time-related information (e.g., setting an alarm clock), checking calendar entries, and taking notes are particularly often used and appreciated. The participants also indicated that the integration of smart home devices, the optimization of existing functionalities, and voice input are important. Still, also potentially negative aspects such as data privacy and data security are relevant. Therefore, it seems particularly interesting to implement offline data processing as far as possible. Our results contribute to this development by providing an overview of empirically collected requirements for functions and implementation-related aspects.
Chapter
Artificial Intelligence (AI) technologies are being part in the human life. This article is related to the implementation of an artificial intelligence-based voice assistance system, which works by the user given commands as a request and give back the output as a response in the speech format. Core innovations are voice initiation, automated speech synthesis, speech-To-Speech, understanding common language. The proposed voice assistance system helps us to make a hand-free model, which acts as a personal assistance and mimics same as like human. Applicability and usability of the proposed model is to create an intelligent mechanism between human and computers as a natural language. Python plugins are used to train the system by using various libraries such as speech recognition, pyttsx, pyAudio. The customization of this project model makes it more flexible and freer to add new features and functionalities without disturbing the current system functionalities. It assists to eradicate the unnecessary kind of manual work required in the user life which will be performed in daily activities, not only does it operate on human instructions, but it also refers to the user based on the question or terms demanded. This Intelligent assistance communicate with the user as a result it gives a desired output as a response to the user as a voice and displays its response on the screen of the user gadget.KeywordsArtificial intelligenceAssistancePyaudioRNNSpeech recognitionText -to- speech
Article
In the current situation of many visually handicapped people worldwide, yet the corresponding number of guide dogs is quite rare. It activates the application of advanced technology to broaden their horizons and allow them to embrace the world. This paper will review the research state of the Guide Dog Robot (GDR) for people with visual impairment and present some views. According to the application scenes, we have divided the GDR into two categories: specific scene applicable type and universal scene applicable type, with the description of different performances under various scenes. Then the current research focuses are elaborated, including localization and navigation technology, recognition of traffic signs, human–robot interaction (HRI), speed coordination, and walking structure design. Subsequently, the studying directions and challenges of GDR are discussed, and collaborative human–robot mode is believed to become the research mainstream. Finally, we conclude this review and explain why few GDR has realized commercialization. The limitations of current studies and some recommendations for future research are presented.
Conference Paper
Full-text available
Cane is a tool that used by blind people or someone who has visually impaired which is caused by an accident or an illness. Cane helps the blind people to check whether there are any obstacles around them. However, before doing the research, we conducted a questionnaire to the blind people about what kind of extension module that would be implemented on their cane. Based-on the result, they need an extension smart tools for their cane that can give information about hitch, obstacle, hole, and also the direction of compass wind position to guide their way and also inform them Qibla direction. This research designed a prototype named Smart Guide Extension that can detect obstacles, holes and give information about eights wind direction using Arduino. The obstacles and holes module uses 2 PING Sensors, while the 8 direction of the wind information uses CMP compass sensor 511. All the information will be informed through the sound. The results of testing the obstacle data module stated that the buzzer will be active at a distance of 150 cm–3 cm and the speed of beep sound faster from 1.1 until 0.3 seconds. The test holes modules state that the beep sound active at a distance of 10–50 cm. The system can detect eight direction of the compass wind position with deviation angle position about ±3°. Based on the questionnaire of trial prototype to the responder, 77.38% of responder stated these tools are user friendly and easily to used.
Conference Paper
Full-text available
Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.
Conference Paper
Full-text available
A prototype of an intelligent guide for the blind person is successfully designed, implemented and tested. The prototype device is facilitating the movement of blind person by warning him/her about any nearby obstacles in order to help him/her during daily activities. The guidance will be provided in the form of audio instructions through the headset and based on real-time situation for both indoor and outdoor environments. The system is successfully tested at different normal conditions with Emirates Blind Care Association in Sharjah, UAE.
Article
A design method of a guide device with multi-function presented in this paper, the multi-function contents of obstacle avoidance, azimuth guidance, banknote recognition and time broadcast. The hardware system of the guide device based on single-chip system STC89C52 and peripheral added eight expansion units: Power supply unit, key input unit, ultrasonic detection unit, azimuth guidance unit, banknote recognition unit, clock Unit, phonetic unit and display unit. The functions were tested and results show that the device is suitable for sunny conditions whose best setting distance range is between 700mm to 1600mm and direction function is good showing in open environment, while the function of currency recognition has some limitations because of color identification. Comprehensive consideration the device can meet the blind person's basic travel demand and has the characteristics of low-cost and stable performance.
Article
This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.
Research on Obstacles of Information Acquisition for the Blind in China
  • ying
Ying, Z. and G. Chaobing (2014). "Research on Obstacles of Information Acquisition for the Blind in China." Journal of Modern Information(07): 10-13.