Conference PaperPDF Available

Construction of a voice driven life assistant system for visually impaired people

May 2018

May 2018

DOI:10.1109/ICAIBD.2018.8396172

Conference: 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD)

Authors:

Runze Chen

Beijing University of Posts and Telecommunications

Show all 6 authorsHide

Module diagram of the project.

…

Deployment diagram of the project.

…

Flowchart diagram of voice interface module.

…

Contents of accessibility service.

…

Contents of information service.

…

Figures - uploaded by Runze Chen

Content may be subject to copyright.

Content uploaded by Runze Chen

Content may be subject to copyright.

Construction of a Voice Driven Life Assistant System for Visually Impaired People

Runze Chen, Zhanhong Tian, Hailun Liu, Fang Zhao, Shuai Zhang, Haobo Liu

School of Software Engineering

Beijing University of Posts and Telecommunications

Beijing, China

e-mail: chenrz925@bupt.edu.cn

Abstract—The rapid development of artificial intelligence and

mobile computation brings more convenient life to the blind

and visually impaired people. This paper presents a prototype

of a voice assistant specially designed for them. The system

mainly contains fundamental services including falling

detection, safety care, accessibility of mobile phone, daily

information broadcasting and view description to make life

easier for them. Natural language understanding, voice

recognition and synthesis have been integrated to enable users

operate majority of mobile phones' functions. Also, the built-in

falling detection algorithm based on tri-axis accelerometer and

object detection algorithm based on Mask R-CNN can enrich

sense of users and at the same time keep the safety of users.

Keywords-voice assistant; navigation; visually impaired;

natural language understanding; accessibility; mobile computing

I. INTRODUCTION

With the rapid development of artificial intelligence and

mobile computing, modern technology has brought more

convenient to the blind and visually impaired people. Lately,

it's estimated that there are about 253 [1] million people with

vision impairment. The group of visually impaired persons

are undergoing an inconvenient daily life without the

assistant from their family or friends. An effective method

for guidance is using guide dogs. However, the disadvantage

of this method is that guide dogs need plenty of money and

time to train and feed it. Also, the blind need an approach to

get to know about the life outside their home and they desire

to have access to internet and mobile services as normal one.

However, as for them, a lot of obstacles exist and there

remains need improvement from society and technology,

including lack of information resources for blind, inadequate

infrastructure and lack of technical input. [2] But, according

to our research, the rapid development of artificial

intelligence and mobile computing technology can be an

ideal solution to help the blind and visually impaired persons

to perceive their surroundings.

To assist blind and visually impaired group, many

solutions has been provided. Some solutions try to design a

hardware system to provide some fundamental functions. For

example, Mohamed Manoufali [3] et al. designed cane for

blind with obstacle detection by ultrasonic sensor. And Siti

Fauziah Toha [4] et al. also has the similar idea about

assistance. However, those solutions can’t detect the objects

around the user. Another type of solution for blind assistance

is to provide guide and service to blind users. Jiayin S. [5] et

al. provides a construct of a guide device with some

fundamental functions and the user need to press buttons to

use those services but the user experience and functions are

limited.

In this paper, we presented "beEYE", an extendable

system launched on Android phone to provide functions for

visually impaired people. It has functions including

messaging, describing the street view, navigating to certain

place, etc. We have integrated those discrete function into a

unified system with a voice interface provided to the blind.

With our system, we hope to greatly improve their life.

II. RELATED WORK

There are many solutions to simplify the way people

interacting with computers. The stability of natural language

understanding and voice recognition have developed so well

that blind person can also has the chance to use the mobile

phone easily. To understand the intention of user and extract

key information in the sentences spoken, natural language

understanding technology should classify the intent and its

content so as to extract the entities from the raw sentence.

Microsoft has released LUIS [6], a natural language

understanding service which can extract the intent and

entities from the sentence. An open-source project named

Rasa NLU [7] can also provide support to classify the intent

and extract the entities. However, Rasa NLU need to be

modified to understand Chinese text.

A falling detection system has been designed by Wang

Rong [8] et al. has provide a solution to detect elder people's

movement. Also, as a risk warning service, the falling

detection system can be used to protect blind and visually

impaired people and alarm their family when abnormal event

happens. Kaiming He [9] created a method to detect objects

in images which extends Fast R-CNN named Mask R-CNN

[10]. The object detection technology can help blind group to

know what appears in front of their walking direction well,

so we also integrated the Mask R-CNN algorithm into the

system to describe the view in front of the blind user.

III. SYSTEM ARCHITECTURE

We have unified different approach into an unique one

with a voice interface so that users just need to speak to the

platform to get service. The service provided by the system

contains 5 modules. To run this system, an Android device

with GPS system, three-axis gyroscope and internet

connection is needed. Optionally, a Bluetooth headphone is

,QWHUQDWLRQDO&RQIHUHQFHRQ$UWLILFLDO,QWHOOLJHQFHDQG%LJ'DWD

needed to provide a better experience to users. The total

construct of modules in this study is displayed in Fig. 1.

Figure 1. Module diagram of the project.

When a user speaks out the command or query to the

device, the application will connect to the back-end server

and the back-end server will provide a RESTful [11] service

to define which module to access judged from the natural

language command. The redirected command will

authenticate the user and provide demanded information or

invoke the local modules to compute or react to the

physically action of user. Some safety-related module will

run independently to sense the abnormal movement from

blind users. When user has an abnormal action, the platform

can immediately detect and notify the guard user of the blind

user with the location and latest walking trails. The vision

description module need to collect the data from sensors and

need the computation of algorithms deployed on server. Fig.

2 shows deployment of the project.

Figure 2. Deployment diagram of the project.

A. Dialogue System

The technology of Chinese speech recognition has been

well developed by iFLYTEK [12], so we integrated it into

the platform to get the raw natural Chinese sentence from

users' voice. However, the dictionary of voice recognition

needs to be extended because the customized words such as

names in contacts are also needed to be recognized to

support the services of accessibility. For example, a wrong

homophonic name can’t be searched out from the contact

database. To solve this problem, we locally extended the

dictionary of the voice recognition module according to

limited user data to get a better accuracy.

Another important problem need to be solved is to

understand the intent of visually impaired users and get the

key information from the expression parsed from the human

voice. To classify the intent of user from commands, this

study trained a model based on the LUIS [6] with well

accuracy and stability to understand natural language. For

example, if user speak "I want to walk to Beijing University

of Posts and Telecommunications", the model can parse this

command to an intent "Navigation" and entities of "Beijing

University of Posts and Telecommunications" with the type

"Location" as well as the transport entity "Walking". With

the intent and entities, the navigation service can search and

plan the route for this query in order to navigate user to the

certain place.

Responding to the user with a human voice is the output

of the system, when user gives command to the system to

execute different tasks frequently, the response will be

produced before the voice output and the latest response will

yield the last one speaking response without any measures.

To deal with this problem, an asynchronous queue is placed

as a buffer for different responses, which can also reduce

coupling of voice interface and diverse services. To make

sure the request completely received, the recorder can be

wake up by user's touch on the whole screen and notify the

user by vibrator when record action starts. To explain the

process of voice interface, Fig. 3 shows the process of

dealing with human voice.

Figure 3. Flowchart diagram of voice interface module.

B. Navigation & Security

The project provides solutions of walking routine and

public transport for transfer for blind users. When the user

gives the command to the platform for navigating to certain

place, the platform will schedule the best routine and start

backend navigation service. With the navigation service,

users can know when to change direction and how long the

distance they need to walk. AMap [13] service is integrated

to provide the key data for navigation so that the platform

can access more completed routine information for users.

When user need to know where they are, they can ask to the

platform where they are and the platform can also provide an

accurate answer. To avoid unexpected event, the navigation

service will also judge whether user walks toward the right

direction by the built-in compass of the device as a

supplement of blind cane.

Related to the location and movement information,

security is also important for users. This study implements an

algorithm computing backend to judge the state of user [8],

when unexpected fall happens the system can immediately

notify to the guard user by messaging the location and

dialing. With the real-time accessible from tri-axis

accelerometer, the algorithm can provide an attitude

estimation of user to make sure the safety of user. We can

access attitude angle from the built-in tri-axis accelerometer

of devices. To determine attitude angle, the following

equations can be applied.

ℎ =arctan⁡(

2+2)

 =arctan⁡(

2+2)

 =arctan⁡(2+2

)

In above equation, pitch represents the angle of rotation

around the Y axis, which is the angle of the body's backward

pitch; roll represents around the X axis rotation Angle, which

is the body side-slip angle from left to right and yaw presents

the rotation angle around X axis which is the rotation angle

of body from left to right. And the study can collect the data

of pitch, roll and yaw, which can be used to train and analyze

the normal range of user's movement. To avoid the noise

from data, the Kalman filtering algorithm is used to improve

the system's reliability.

To better ensure the safety of user, we collect some extra

data from different sensor including angular velocity(from

gyroscope) and acceleration(from acceleration sensor) and

improve this method. With these data, we can calculate the

acceleration vector sum (AVS) and the angular velocity

vector sum (AVVS) to detect the movement of user using

equation (4) and equation (5).

AVS = 2+2+2

AVVS = 2+2+2

With several sets of data collected consecutively, the

system will give an alarm that a falling incident is happened

when half of the data sets exceed the threshold. If almost all

of the data sets exceed the threshold, the system will give a

notification, asking if the user needs to stop the alarm in five

seconds.

C. Accessibility

The accessibility module provides a unified approach to

access the function of mobile phone. The platform currently

provides the following features displayed in Fig. 4.

Figure 4. Contents of accessibility service.

To provide above features, we developed a solution

depends on the accessibility service of Android operating

system and the platform will work with applications

provided by other companies such as WeChat.

D. Information Service

Figure 5. Contents of information service.

The platform also provides diversity of information

services to facilitate the access to life information. We

integrated some information API to access certain

information and developed some built-in modules to provide

different information for user to query. Currently, the project

provides the following information services displayed in Fig.

E. Vision Description

Vision description can help user to know more about

circumstances forward. To some degree, it can also assist

blind users to avoid some danger such as collision with

pedestrian or other facilities on sidewalk. We implemented

and integrated Mask R-CNN [10] to recognize the objects

captured by the device camera. Training Mask R-CNN with

cityscapes [14] dataset, the vision description module can

describe certain objects captured in camera well, such as

bicycles, cars or pedestrians.

The vision description service can start when user ask

about what things are in front of them self. The system can

automatically capture the image of their foreground and send

it to the back-end server provided by us to compute for the

result. When getting the result of Mask R-CNN [10] model,

the module will list all the objects detected and speak out all

of objects.

IV. TEST RESULTS

During the development process of the prototype of the

system, we have tested all the functions and features in the

platform.

Figure 6. Walking route of volunteer.

We invited volunteers to experience and test the system.

The volunteers are demanded to wear a blinder to experience

the real condition of blind or visually impaired users. And

we tested in the campus of Beijing University of Posts and

Telecommunications. The volunteer has walked along the

route in Fig. 6. When walking alone the route calculated by

navigation service, the volunteer can successfully start and

yield the navigation and the navigation service can

immediately reroute when the direction has been suddenly

changed.

During the navigation process, the volunteer command

the system to describe the foreground. After few seconds of

delay, the server can respond the objects detected objects

captured by camera. Because of camera's instability, the

volunteer need to steady the phone to capture clearer images.

We tested this service in conditions of street and classroom

and the algorithm can provide a well result. The Fig. 7

displayed the effect of object detection algorithm.

We deploy the object detection service on a server with

NVIDIA TESLA P100 GPU and evaluated the performance

of the service when it is accessed via the web API provided

by us. And We have collected about 100 images randomly

captured by volunteers to evaluate the performance of the

object detection. We perform the test in different Internet

environment and the Internet connection is stable and

available. In Table I, the test result has been recorded.

TABLE I. COMPUTING TIME OF OBJECT DETECTION IN DIFFERENT

INTERNET ENVIRONMENT

Internet Connection

Average

Computing Time

(ms)

Average Server

Computing Time (ms)

Indoor Wi-Fi

1582

1276

4G (CMCC)

1707

1310

4G (ChinaNet)

1910

1315

Table I displays the time needed to parse the result from

captured images. We can infer from the test result that the

internet connection is not the key factor to take more than 1

second of the object detecting process, and a carefully

simplified model without losing much of the object detection

accuracy can help improve the efficiency. The average

computing time including the time consumed by

communication via Internet and the running of backend

object detection algorithm and the average server computing

time including merely the time consumed by the backend

algorithm.

The recognition of commands has been also tested by the

team. To evaluate performance of dialogue system, we

recorded the accuracy of conversations. In Table II, we can

see that the whole dialogue system can recognize and extract

the intent and entities from users' voice with a well accuracy.

Figure 7. Effect of object detection algorithm.

However, the accuracy of entity recognition can still be

improved if a larger data set of natural sentences is available.

And we are still working hard to enlarge the sentence dataset

to improve the accuracy.

TABLE II. ACCURACY OF TESTING VOICE INTERFACE

Function

Intent Accuracy

Entity Accuracy

Weather

98.9%

91.9%

Navigation

95.3%

88.6%

Joke

99.5%

No entity

Message

97.6%

92.3%

Dial

96.7%

90.1%

Location

100.0%

No entity

Figure 8 . AVS and AVVS when wa lkin g normally.

Figure 9 . AVS and AVVS when walking upstair.

Figure 10. AVS and AVVS when puting the mobile phone into trouser

pocket.

Figure 11. AVS and AVVS when falling backward.

The falling detection service has also been tested under

different conditions. Some test conditions are displayed in

Figure 8, Figure 9, Figure 10 and Figure 11. As displayed in

the following figures, the AVS and AVVS can be used to get

the movement of device and can also help the system

determine the state of the user when the sensor data has

different values larger than the threshold value. And it can be

noticed that the change is smaller than the threshold value

when the user is walking normally and the change can be

dramatic when a falling incident is happened. When testing

the algorithm, we set the threshold value of AVS and AVVS

to 25 and the test can produce a reasonable result.

V. CONCLUSION

This paper implemented a prototype of a voice assistant

to provide daily service to blind and visually impaired

persons. The overall architecture of the platform is displayed

in Fig. 12. As a supplement of blind cane, the system can

help blind people using mobile information service easily

and partially keep the security of users. However, the dream

to help blind persons to walk without cane or guide dogs

needs more work to chase. With this prototype of voice

assistant, hopefully, blind and visually impaired persons can

enjoy a more convenient daily life.

Figure 12. Conceptual figure of the project.

The system can provide a complete outdoor navigation

service but it has some issues just as lack of indoor

navigation service. Some solutions exist, but those solutions

still need hardware components to provide key data to

outdoor navigation algorithms. Security functions need more

development and the system need more hardware devices to

integrate sensors with higher accuracy.

The system also has some limitations on functions and

features for accessibility service and information service.

More information can be collected to integrated into the

system in future work. The performance of the prototype has

potential to be improved and algorithms in some module, for

example, natural language understanding, objects detection

and obstacle detection. More algorithms can be tested and

researched to have a better performance and accuracy to

provide a better result on functions of the project.

ACKNOWLEDGMENT

Research Innovation Fund for College Students of

Beijing University of Posts and Telecommunications

REFERENCES

[1] World Health Organization. (2017). "Vision impairment and

blindness." Retrieved March 16, 2018, from

http://www.who.int/mediacentre/factsheets/fs282/en/.

[2] Ying, Z. and G. Chaobing (2014). "Research on Obstacles of

Information Acquisition for the Blind in China." Journal of Modern

Information(07): 10-13.

[3] Manoufali, M., et al. (2011). Smart guide for blind people. 2011

International Conference and Workshop on the Current Trends in

Information Technology, CTIT'11, October 26, 2011 - October 27,

2011, Dubai, United arab emirates, IEEE Computer Society.

[4] Mutiara, G. A., et al. (2016). Smart guide extension for blind cane.

4th International Conference on Information and Communication

Technology, ICoICT 2016, May 25, 2016 - May 27, 2016, Bandung,

Indonesia, Institute of Electrical and Electronics Engineers Inc.

[5] Song, J., et al. (2016). "The design of a guide device with multi -

function to aid travel for blind person." International Journal of Smart

Home 10(4): 77-86.

[6] Microsoft (2018). "LUIS." Retrieved March 16, 2018, from

https://www.luis.ai/.

[7] Rasa Technologies GmbH. (2018). "Rasa NLU." Retrieved March 16,

2018, from https://nlu.rasa.ai/.

[8] Rong, W., et al. (2012). "Design and implementation of fall detection

system using tri-axis accelerometer." Journal of Computer

Applications(05): 1450-1452+1456.

[9] Girshick, R. (2015). Fast R-CNN. 15th IEEE International

Conference on Computer Vision, ICCV 2015, December 11, 2015 -

December 18, 2015, Santiago, Chile, Institute of Electrical and

Electronics Engineers Inc.

[10] He, K., et al. (2017). Mask R-CNN. 16th IEEE International

Conference on Computer Vision, ICCV 2017, October 22, 2017 -

October 29, 2017, Venice, Italy, Institute of Electrical and Electronics

Engineers Inc.

[11] Fielding, R. T. (2000). Architectural styles and the design of network-

based software architectures, University of California, Irvine: xvii,

162 leaves.

[12] iFLYTEK. "iFLYTEK." Retrieved March 15, 2018, from

http://www.xfyun.cn/.

[13] AMap. "AMap." Retrieved March 18, 2018, from

http://lbs.amap.com/.

[14] Cordts, M., et al. (2016). The Cityscapes Dataset for Semantic Urban

Scene Understanding. 29th IEEE Conference on Computer Vision

and Pattern Recognition, CVPR 2016, June 26, 2016 - July 1, 2016,

Las Vegas, NV, United states, IEEE Computer Society.

Prediction of Football Player Value using Bayesian Ensemble Approach

Preprint

Jun 2022

The transfer fees of sports players have become astronomical. This is because bringing players of great future value to the club is essential for their survival. We present a case study on the key factors affecting the world's top soccer players' transfer fees based on the FIFA data analysis. To predict each player's market value, we propose an improved LightGBM model by optimizing its hyperparameter using a Tree-structured Parzen Estimator (TPE) algorithm. We identify prominent features by the SHapley Additive exPlanations (SHAP) algorithm. The proposed method has been compared against the baseline regression models (e.g., linear regression, lasso, elastic net, kernel ridge regression) and gradient boosting model without hyperparameter optimization. The optimized LightGBM model showed an excellent accuracy of approximately 3.8, 1.4, and 1.8 times on average compared to the regression baseline models, GBDT, and LightGBM model in terms of RMSE. Our model offers interpretability in deciding what attributes football clubs should consider in recruiting players in the future.

A Systematic Survey on Android API Usage for Data-Driven Analytics with Smartphones

Article

Apr 2022

Recent industrial and academic research has focused on data-driven analytics with smartphones by collecting user interaction, context, and device systems data through Application Programming Interfaces (APIs) and sensors. The Android OS provides various APIs to collect such mobile usage and sensor data for third-party developers. Usage Statistics API (US API) and Accessibility Service API (AS API) are representative Android APIs for collecting app usage data and are used for various research purposes as they can collect fine-grained interaction data (e.g., app usage history, user interaction type). Furthermore, other sensor APIs help to collect a user’s context and device state data, along with AS/US APIs. This review investigates mobile usage and sensor data-driven research using AS/US APIs, by categorizing the research purposes and the data types. In this paper, the surveyed studies are classified as follows: five themes and 21 subthemes, and a four-layer hierarchical data classification structure. This allows us to identify a data usage trend and derive insight into data collection according to research purposes. Several limitations and future research directions of mobile usage and sensor data-driven analytics research are discussed, including the impact of changes in the Android API versions on research, the privacy and data quality issues, and the mitigation of reproducibility risks with standardized data typology.

Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking

Conference Paper

Dec 2023

An Email Assistant System based on Voice Command

Conference Paper

Oct 2023

Voice-based email system for visually impaired people

Conference Paper

Jan 2023

Efficient Sentiment classification with Minimal parameters using Average Embedding Approach

Conference Paper

Apr 2023

Voice based E-mail for the Visually Impaired

Article

Apr 2023

One of the most popular methods for people to interact with one another is via email. Today, a significant quantity of urgent and confidential information is sent via email. People are getting closer to a digital existence and a digital communication as technology advances. In this new advanced era, there are many methods to communicate with others online. The majority of them choose email conversation as the most effective form of communication. (Email). Email is the technology that enables users to communicate with others by writing emails and also aids in business-related communication. Their square measure, however, is unable to use these technologies because they are illiterate or unable to look at a computer. So, writers developed this technology to be more accessible to people who are blind.

Pardon? An Overview of the Current State and Requirements of Voice User Interfaces for Blind and Visually Impaired Users

Chapter

Jul 2022

People with special needs like blind and visually impaired (BVI) people can particularly benefit from using voice assistants providing spoken information input and output in everyday life. However, it is crucial to understand their needs and include them in developing accessible and useful assistance systems. By conducting an online survey with 146 BVI people, this paper revealed that common voice assistants like Apple’s Siri or Amazon’s Alexa are used by a majority of BVI people and are also considered helpful. In particular, features in audio entertainment, internet access, and everyday life practical things like weather queries, time-related information (e.g., setting an alarm clock), checking calendar entries, and taking notes are particularly often used and appreciated. The participants also indicated that the integration of smart home devices, the optimization of existing functionalities, and voice input are important. Still, also potentially negative aspects such as data privacy and data security are relevant. Therefore, it seems particularly interesting to implement offline data processing as far as possible. Our results contribute to this development by providing an overview of empirically collected requirements for functions and implementation-related aspects.

Implementation of Artificial Intelligence Based Sustainable Smart Voice Assistance

Chapter

Jan 2022

Artificial Intelligence (AI) technologies are being part in the human life. This article is related to the implementation of an artificial intelligence-based voice assistance system, which works by the user given commands as a request and give back the output as a response in the speech format. Core innovations are voice initiation, automated speech synthesis, speech-To-Speech, understanding common language. The proposed voice assistance system helps us to make a hand-free model, which acts as a personal assistance and mimics same as like human. Applicability and usability of the proposed model is to create an intelligent mechanism between human and computers as a natural language. Python plugins are used to train the system by using various libraries such as speech recognition, pyttsx, pyAudio. The customization of this project model makes it more flexible and freer to add new features and functionalities without disturbing the current system functionalities. It assists to eradicate the unnecessary kind of manual work required in the user life which will be performed in daily activities, not only does it operate on human instructions, but it also refers to the user based on the question or terms demanded. This Intelligent assistance communicate with the user as a result it gives a desired output as a response to the user as a voice and displays its response on the screen of the user gadget.KeywordsArtificial intelligenceAssistancePyaudioRNNSpeech recognitionText -to- speech

Development and application of key technologies for Guide Dog Robot: A systematic literature review

Article

Apr 2022
ROBOT AUTON SYST

In the current situation of many visually handicapped people worldwide, yet the corresponding number of guide dogs is quite rare. It activates the application of advanced technology to broaden their horizons and allow them to embrace the world. This paper will review the research state of the Guide Dog Robot (GDR) for people with visual impairment and present some views. According to the application scenes, we have divided the GDR into two categories: specific scene applicable type and universal scene applicable type, with the description of different performances under various scenes. Then the current research focuses are elaborated, including localization and navigation technology, recognition of traffic signs, human–robot interaction (HRI), speed coordination, and walking structure design. Subsequently, the studying directions and challenges of GDR are discussed, and collaborative human–robot mode is believed to become the research mainstream. Finally, we conclude this review and explain why few GDR has realized commercialization. The limitations of current studies and some recommendations for future research are presented.

Smart guide extension for blind cane

Conference Paper

Full-text available

May 2016

Cane is a tool that used by blind people or someone who has visually impaired which is caused by an accident or an illness. Cane helps the blind people to check whether there are any obstacles around them. However, before doing the research, we conducted a questionnaire to the blind people about what kind of extension module that would be implemented on their cane. Based-on the result, they need an extension smart tools for their cane that can give information about hitch, obstacle, hole, and also the direction of compass wind position to guide their way and also inform them Qibla direction. This research designed a prototype named Smart Guide Extension that can detect obstacles, holes and give information about eights wind direction using Arduino. The obstacles and holes module uses 2 PING Sensors, while the 8 direction of the wind information uses CMP compass sensor 511. All the information will be informed through the sound. The results of testing the obstacle data module stated that the buzzer will be active at a distance of 150 cm–3 cm and the speed of beep sound faster from 1.1 until 0.3 seconds. The test holes modules state that the beep sound active at a distance of 10–50 cm. The system can detect eight direction of the compass wind position with deviation angle position about ±3°. Based on the questionnaire of trial prototype to the responder, 77.38% of responder stated these tools are user friendly and easily to used.

The Cityscapes Dataset for Semantic Urban Scene Understanding

Conference Paper

Full-text available

Jun 2016

Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

Smart guide for blind people

Conference Paper

Full-text available

Oct 2011

A prototype of an intelligent guide for the blind person is successfully designed, implemented and tested. The prototype device is facilitating the movement of blind person by warning him/her about any nearby obstacles in order to help him/her during daily activities. The guidance will be provided in the form of audio instructions through the headset and based on real-time situation for both indoor and outdoor environments. The system is successfully tested at different normal conditions with Emirates Blind Care Association in Sharjah, UAE.

Mask R-CNN

Conference Paper

Oct 2017

The Design of a Guide Device with Multi-Function to Aid Travel for Blind Person

Article

Apr 2016

A design method of a guide device with multi-function presented in this paper, the multi-function contents of obstacle avoidance, azimuth guidance, banknote recognition and time broadcast. The hardware system of the guide device based on single-chip system STC89C52 and peripheral added eight expansion units: Power supply unit, key input unit, ultrasonic detection unit, azimuth guidance unit, banknote recognition unit, clock Unit, phonetic unit and display unit. The functions were tested and results show that the device is suitable for sunny conditions whose best setting distance range is between 700mm to 1600mm and direction function is good showing in open environment, while the function of currency recognition has some limitations because of color identification. Comprehensive consideration the device can meet the blind person's basic travel demand and has the characteristics of low-cost and stable performance.

Design and implementation of fall detection system using tri-axis accelerometer

Article

May 2013

Fast r-cnn

Article

Apr 2015

Ross Girshick

This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.

Architectural Styles and the Design of Network-based Software Architectures

Thesis

Jan 2000

R. Fielding

Research on Obstacles of Information Acquisition for the Blind in China

Jan 2014
10

ying

Ying, Z. and G. Chaobing (2014). "Research on Obstacles of Information Acquisition for the Blind in China." Journal of Modern Information(07): 10-13.

Construction of a voice driven life assistant system for visually impaired people

Figures

Recommended publications

‘Silent voices’ in research with visually impaired children: ethnicity and socio-economic variation...

Development of Qibla Direction Cane for Blind Using Interactive Voice Command

Visual Assistance for Blind Using Image Processing

A Multi-channel Deep Learning Architecture for Understanding the Urban Scene Semantics

Automatic Borescope Damage Assessments for Gas Turbine Blades via Deep Learning