Fig 10 - uploaded by Tim Morris
Content may be subject to copyright.
Backus-Naur form (BNF) [8] definition of the device profile.

Backus-Naur form (BNF) [8] definition of the device profile.

Source publication
Article
Full-text available
Many domestic appliances and much office equipment is controlled using a keypad and a small digital display. Programming such devices is problematical for the blind and visually handicapped. In this paper, we describe a device that may be used to read the displays on these devices. The device is designed to accept a description of the display being...

Citations

... However, ensuring accessibility to physical interfaces has posed an enduring challenge [43,97], as certain actions cannot be performed solely through cameras. For instance, the ability to turn on or off a light switch typically requires direct physical interaction. ...
Preprint
Full-text available
We introduce GeXSe (Generative Explanatory Sensor System), a novel framework designed to extract interpretable sensor-based and vision domain features from non-invasive smart space sensors. We combine these to provide a comprehensive explanation of sensor-activation patterns in activity recognition tasks. This system leverages advanced machine learning architectures, including transformer blocks, Fast Fourier Convolution (FFC), and diffusion models, to provide a more detailed understanding of sensor-based human activity data. A standout feature of GeXSe is our unique Multi-Layer Perceptron (MLP) with linear, ReLU, and normalization layers, specially devised for optimal performance on small datasets. It also yields meaningful activation maps to explain sensor-based activation patterns. The standard approach is based on a CNN model, which our MLP model outperforms.GeXSe offers two types of explanations: sensor-based activation maps and visual domain explanations using short videos. These methods offer a comprehensive interpretation of the output from non-interpretable sensor data, thereby augmenting the interpretability of our model. Utilizing the Frechet Inception Distance (FID) for evaluation, it outperforms established methods, improving baseline performance by about 6\%. GeXSe also achieves a high F1 score of up to 0.85, demonstrating precision, recall, and noise resistance, marking significant progress in reliable and explainable smart space sensing systems.
... For example, using optical character recognition (OCR), several systems (such as the KNFB Reader [43]) have been developed to assist blind people in reading visual text. Camera-based solutions, such as those attached to a table [23], worn on the finger [41,45], or held in the hand [39], are proposed to recognize text in physical documents and allow a blind person to hear and interact with them. ...
Preprint
Full-text available
Despite their growing popularity, many public kiosks with touchscreens are inaccessible to blind people. Toucha11y is a working prototype that allows blind users to use existing inaccessible touchscreen kiosks independently and with little effort. Toucha11y consists of a mechanical bot that can be instrumented to an arbitrary touchscreen kiosk by a blind user and a companion app on their smartphone. The bot, once attached to a touchscreen, will recognize its content, retrieve the corresponding information from a database, and render it on the user's smartphone. As a result, a blind person can use the smartphone's built-in accessibility features to access content and make selections. The mechanical bot will detect and activate the corresponding touchscreen interface. We present the system design of Toucha11y along with a series of technical evaluations. Through a user study, we found out that Toucha11y could help blind users operate inaccessible touchscreen devices.
... Locating the RoIs is a crucial step and impacts the final accuracy. Many works try to simplify the problem while assuming the localization step is somehow already done, either manually by a client, or by fixing the device or using special markers [8], [9], [10], [11], [13], [15]. A few others [6], [12], [16], and our work take a more holistic approach, applying automated localization as well. ...
Preprint
Full-text available
Despite essential efforts towards advanced wireless medical devices for regular monitoring of blood properties, many such devices are not available or not affordable for everyone in many countries. Alternatively using ordinary devices, patients ought to log data into a mobile health-monitoring manually. It causes several issues: (1) clients reportedly tend to enter unrealistic data; (2) typing values several times a day is bothersome and causes clients to leave the mobile app. Thus, there is a strong need to use now-ubiquitous smartphones, reducing error by capturing images from the screen of medical devices and extracting useful information automatically. Nevertheless, there are a few challenges in its development: (1) data scarcity has led to impractical methods with very low accuracy: to our knowledge, only small datasets are available in this case; (2) accuracy-availability tradeoff: one can execute a less accurate algorithm on a mobile phone to maintain higher availability, or alternatively deploy a more accurate and more compute-intensive algorithm on the cloud, however, at the cost of lower availability in poor/no connectivity situations. We present an ensemble learning algorithm, a mobile-cloud computing service architecture, and a simple compression technique to achieve higher availability and faster response time while providing higher accuracy by integrating cloud- and mobile-side predictions. Additionally, we propose an algorithm to generate synthetic training data which facilitates utilizing deep learning models to improve accuracy. Our proposed method achieves three main objectives: (1) 92.1% and 97.7% accuracy on two different datasets, improving previous methods by 40%, (2) reducing required bandwidth by 45x with 1% drop in accuracy, (3) and providing better availability compared to mobile-only, cloud-only, split computing, and early exit service models.
... Beginning with traditional methods, Morris et al. [7] developed Clearspeech, a display reader for visually impaired people, by extending Eigenimage recognition to detect digital display numbers. Nearly 90% of evaluated displays are accurate in properly lighted. ...
Preprint
Full-text available
In light of the COVID-19 pandemic, patients were required to manually input their daily oxygen saturation (SpO2) and pulse rate (PR) values into a health monitoring system-unfortunately, such a process trend to be an error in typing. Several studies attempted to detect the physiological value from the captured image using optical character recognition (OCR). However, the technology has limited availability with high cost. Thus, this study aimed to propose a novel framework called PACMAN (Pandemic Accelerated Human-Machine Collaboration) with a low-resource deep learning-based computer vision. We compared state-of-the-art object detection algorithms (scaled YOLOv4, YOLOv5, and YOLOR), including the commercial OCR tools for digit recognition on the captured images from pulse oximeter display. All images were derived from crowdsourced data collection with varying quality and alignment. YOLOv5 was the best-performing model against the given model comparison across all datasets, notably the correctly orientated image dataset. We further improved the model performance with the digits auto-orientation algorithm and applied a clustering algorithm to extract SpO2 and PR values. The accuracy performance of YOLOv5 with the implementations was approximately 81.0-89.5%, which was enhanced compared to without any additional implementation. Accordingly, this study highlighted the completion of PACMAN framework to detect and read digits in real-world datasets. The proposed framework has been currently integrated into the patient monitoring system utilized by hospitals nationwide.
... Directly pressing the physical buttons on a robot was one major way to start and stop a robot. Like other modern electrical appliances [29,44], participants found the buttons on many service robots challenging to use. Five participants reported the difficulty of distinguishing the physical buttons on their vacuum robots since the buttons were all flat (like a touchscreen) and blended into the robot surface without providing any tactile feedback. ...
... From the direct user's perspective, our study thoroughly revealed PVI's challenges, strategies, and needs when using and controlling different service robots (RQ1). Some of our findings confirmed the insights from prior research, such as the inaccessibility of the control panels on electrical appliances [29,44] and the difficulty of controlling drone movements accurately in real-time [30]. Beyond that, we uncovered more detailed evidence of PVI's challenges and coping strategies during their interactions with robots. ...
Preprint
Full-text available
Mobile service robots have become increasingly ubiquitous. However, these robots can pose potential accessibility issues and safety concerns to people with visual impairments (PVI). We sought to explore the challenges faced by PVI around mainstream mobile service robots and identify their needs. Seventeen PVI were interviewed about their experiences with three emerging robots: vacuum robots, delivery robots, and drones. We comprehensively investigated PVI's robot experiences by considering their different roles around robots -- direct users and bystanders. Our study highlighted participants' challenges and concerns about the accessibility, safety, and privacy issues around mobile service robots. We found that the lack of accessible feedback made it difficult for PVI to precisely control, locate, and track the status of the robots. Moreover, encountering mobile robots as bystanders confused and even scared the participants, presenting safety and privacy barriers. We further distilled design considerations for more accessible and safe robots for PVI.
... Based on the significance and challenges of recognizing content on digital displays through camera feeds, we consider it as an emerging research problem that can be addressed by human-AI collaboration. From the perspective of AI solutions, there exist a few computer vision systems that assist blind users to read the LCD panels on appliances [38,49,85,111]. However, these systems are heuristicdriven, fairly brittle, and only work in limited circumstances. ...
Conference Paper
Full-text available
Remote sighted assistance (RSA) has emerged as a conversational assistive technology for people with visual impairments (VI), where remote sighted agents provide realtime navigational assistance to users with visual impairments via video-chat-like communication. In this paper, we conducted a literature review and interviewed 12 RSA users to comprehensively understand technical and navigational challenges in RSA for both the agents and users. Technical challenges are organized into four categories: agents’ difficulties in orienting and localizing the users; acquiring the users’ surroundings and detecting obstacles; delivering information and understanding user-specific situations; and coping with a poor network connection. Navigational challenges are presented in 15 real-world scenarios (8 outdoor, 7 indoor) for the users. Prior work indicates that computer vision (CV) technologies, especially interactive 3D maps and realtime localization, can address a subset of these challenges. However, we argue that addressing the full spectrum of these challenges warrants new development in Human-CV collaboration, which we formalize as five emerging problems: making object recognition and obstacle avoidance algorithms blind-aware; localizing users under poor networks; recognizing digital content on LCD screens; recognizing texts on irregular surfaces; and predicting the trajectory of out-of-frame pedestrians or objects. Addressing these problems can advance computer vision research and usher into the next generation of RSA service.
... Assistance for Interacting with Inaccessible Interfaces or Environment. To interact with inaccessible spaces or interfaces, prior research has used computer vision to identify objects in the environment and components in digital interfaces (e.g., [10,27,31,70]). For example, Bigham et al. [10] leveraged crowd workers to help visually impaired individuals recognize various objects. Guo et al. [31] introduced Vizlens, an accessible mobile app that supports people with visual impairments to interact with inaccessible interfaces through crowdsourcing and computer vision. ...
Preprint
Full-text available
Makeup and cosmetics offer the potential for self-expression and the reshaping of social roles for visually impaired people. However, there exist barriers to conducting a beauty regime because of the reliance on visual information and color variances in makeup. We present a content analysis of 145 YouTube videos to demonstrate visually impaired individuals' unique practices before, during, and after doing makeup. Based on the makeup practices, we then conducted semi-structured interviews with 12 visually impaired people to discuss their perceptions of and challenges with the makeup process in more depth. Overall, through our findings and discussion, we present novel perceptions of makeup from visually impaired individuals (e.g., broader representations of blindness and beauty). The existing challenges provide opportunities for future research to address learning barriers, insufficient feedback, and physical and environmental barriers, making the experience of doing makeup more accessible to people with visual impairments.
... Beyond this traditional method, prior research also explored using computer vision [15,17,41,46], voice interactions [1,8], and 3D printed tactile marking [18,22] to better support people with visual impairments interacting with different interfaces. For example, VizLens leveraged computer vision and crowdsourcing to enable people with visual impairments to interact with different interfaces, such as a microwave oven [17]. ...
Preprint
Full-text available
The reliance on vision for tasks related to cooking and eating healthy can present barriers to cooking for oneself and achieving proper nutrition. There has been little research exploring cooking practices and challenges faced by people with visual impairments. We present a content analysis of 122 YouTube videos to highlight the cooking practices of visually impaired people, and we describe detailed practices for 12 different cooking activities (e.g., cutting and chopping, measuring, testing food for doneness). Based on the cooking practices, we also conducted semi-structured interviews with 12 visually impaired people who have cooking experience and show existing challenges, concerns, and risks in cooking (e.g., tracking the status of tasks in progress, verifying whether things are peeled or cleaned thoroughly). We further discuss opportunities to support the current practices and improve the independence of people with visual impairments in cooking (e.g., zero-touch interactions for cooking). Overall, our findings provide guidance for future research exploring various assistive technologies to help people cook without relying on vision.
... The Clearspeech system [9] for example, used marker stickers placed on the corners of the medical device display to aid in detection of the screen as a ROI. Classification of the digits was then performed by eigenimage recognition, a technique developed for face recognition. ...
Article
Full-text available
There is an increasing need for fast and accurate transfer of readings from blood glucose metres and blood pressure monitors to a smartphone mHealth application, without a dependency on Bluetooth technology. Most of the medical devices recommended for home monitoring use a seven-segment display to show the recorded measurement to the patient. We aimed to achieve accurate detection and reading of the seven-segment digits displayed on these medical devices using an image taken in a realistic scenario by a smartphone camera. A synthetic dataset of seven-segment digits was developed in order to train and test a digit classifier. A dataset containing realistic images of blood glucose metres and blood pressure monitors using a variety of smartphone cameras was also created. The digit classifier was evaluated on a dataset of seven-segment digits manually extracted from the medical device images. These datasets along with the code for its development have been made public. The developed algorithm first preprocessed the input image using retinex with two bilateral filters and adaptive histogram equalisation. Subsequently, the digit segments were automatically located within the image by two techniques operating in parallel: Maximally Stable Extremal Regions (MSER) and connected components of a binarised image. A filtering and clustering algorithm was then designed to combine digit segments to form seven-segment digits. The resulting digits were classified using a Histogram of Orientated Gradients (HOG) feature set and a neural network trained on the synthetic digits. The model achieved 93% accuracy on digits found on the medical devices. The digit location algorithm achieved a F1 score of 0.87 and 0.80 on images of blood glucose metres and blood pressure monitors respectively. Very few assumptions were made of the locations of the digits on the devices so that the proposed algorithm can be easily implemented on new devices.
... Making touchscreen interfaces accessible has been a longstanding challenge in accessibility [14,17,30], and some current platforms are quite accessible (e.g., iOS). Solving all of the challenges represented by the combination of difficult issues for public touchscreen devices has remained elusive: (i) touchscreens are inherently visual so a blind person cannot read what they say or identify user interface components, (ii) a blind person cannot touch the touchscreen to explore with out the risk of accidentally triggering something they did not intend, and, (iii) a blind person does not have the option to choose a different touchscreen platform that would be more accessible and cannot get access to the software or hardware to make it work better. ...
... Many physical interfaces in the real world are inaccessible to blind people, which has led to substantial prior work on sys tems for making them accessible. Many specialized computer vision systems have been built to help blind people read the LCD panels on appliances [14,30,33]. These systems have tended to be fairly brittle, and have generally only targeted reading text and not actually using the interface. ...
Conference Paper
Blind people frequently encounter inaccessible dynamic touchscreens in their everyday lives that are difficult, frustrating, and often impossible to use independently. Touchscreens are often the only way to control everything from coffee machines and payment terminals, to subway ticket machines and in-flight entertainment systems. Interacting with dynamic touchscreens is difficult non-visually because the visual user interfaces change, interactions often occur over multiple different screens, and it is easy to accidentally trigger interface actions while exploring the screen. To solve these problems, we introduce StateLens - a three-part reverse engineering solution that makes existing dynamic touchscreens accessible. First, StateLens reverse engineers the underlying state diagrams of existing interfaces using point-of-view videos found online or taken by users using a hybrid crowd-computer vision pipeline. Second, using the state diagrams, StateLens automatically generates conversational agents to guide blind users through specifying the tasks that the interface can perform, allowing the StateLens iOS application to provide interactive guidance and feedback so that blind users can access the interface. Finally, a set of 3D-printed accessories enable blind people to explore capacitive touchscreens without the risk of triggering accidental touches on the interface. Our technical evaluation shows that StateLens can accurately reconstruct interfaces from stationary, hand-held, and web videos; and, a user study of the complete system demonstrates that StateLens successfully enables blind users to access otherwise inaccessible dynamic touchscreens.