Figure - uploaded by Josechu J Guerrero
Content may be subject to copyright.
and standard deviation (in centimetres) of the length and height measured with and without obstacles.

and standard deviation (in centimetres) of the length and height measured with and without obstacles.

Source publication
Article
Full-text available
Stairs are one of the most common structures present in human-made scenarios, but also one of the most dangerous for those with vision problems. In this work we propose a complete method to detect, locate and parametrise stairs with a wearable RGB-D camera. Our algorithm uses the depth data to determine if the horizontal planes in the scene are val...

Contexts in source publication

Context 1
... have excluded the width from the analysis as the view of the stairs may be partial and it is not as relevant as the other measurements. After computing the height and length of a staircases, in both ascending and descending perspectives, from different viewing angles, the results were compared to the real measurements, as shown in the Table 2. As we can ob- serve, the values do not have strong deviation even though the model is computed with one single frame. ...
Context 2
... presence of obstacles par- tially occluding the view of the staircase does not adversely af- fect the quality of the model and we get similar results in terms of average measurements. Our experiment from Table 2 show slighly better standard deviation in the presence of occluding obstacles. This unexpected result is due to the variability of the images in the set and not because of the method itself. ...

Citations

... Because stairs are artificially constructed building structures with obvious geometric features, most stair detection methods rely on the extraction of some stair geometric features. For example, line-based extraction methods [1][2][3] abstract stair geometric features as a set of lines continuously distributed in an image, and potential stair lines are extracted in RGB or depth images through Canny edge detection [4], Sobel filtering, Hough transform [5], etc. Plane-based extraction methods [6][7][8] abstract stair geometric features as a set of planes continuously distributed in space, and potential stair surfaces in point cloud data are extracted through algorithms such as random sample consensus (RANSAC) [9] and supervoxel clustering [10]. ...
Article
Full-text available
Vision-based stair modeling can help autonomous mobile robots deal with the challenge of climbing stairs, especially in unfamiliar environments. To address the problem that current monocular methods are difficult to model stairs accurately without depth information in scenes with fuzzy visual cues, this paper proposes a depth-aware stair modeling method for monocular vision. Specifically, we take the prediction of depth images and the extraction of stair geometric features as joint tasks in a convolutional neural network, with the designed information propagation architecture, we can achieve effective supervision for stair geometric feature learning by depth features. In addition, to complete the stair modeling, we take the convex lines, concave lines, tread surfaces and riser surfaces as stair geometric features and apply Gaussian kernels to enable StairNetV3 to predict contextual information within the stair lines. Combined with the depth information obtained by depth sensors, we propose a point cloud reconstruction method that can quickly segment point clouds of stair step surfaces. The experiments show that the proposed method has a significant improvement over the previous best monocular vision method, with an intersection over union increase of 3.4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, and the lightweight version has a fast detection speed and can meet the requirements of most real-time applications.
... The transformation C T F is then computed by aligning the y axis of the new reference frame F with the ground plane normal n f (Fig. 4). Additionally, the remaining two Manhattan World directions [43] can be recovered considering the best alignment with the normals of the rest of the scene, following [44]. ...
... Here, the most relevant ones are described: Stair detection: Stairs can be a dangerous structure for the visually impaired, being a potential source of accidents. Here, we use the method proposed in [44], [48], which uses RGB-D cameras and thus is straightforward to integrate into our system. This method recovers the pose of the stair with respect to the camera ( C T S ) and the measurements of every step, therefore providing the complete stair model (Fig. 4), useful not only to alert the subject but also for guidance. ...
... For the stairs detection in real settings, we tested several sequences provided by [44] (Fig. 11). In the first example, we can see how the system is able to inform the subject of the presence of both ascending and descending staircases from a large distance, also recovering the full pose and thus enabling the possibility of guiding the user to face the staircase straight or close to the handrails. ...
Article
Full-text available
One of the main challenges of visual prostheses is to augment the perceived information to improve the experience of its wearers. Given the limited access to implanted patients, in order to facilitate the experimentation of new techniques, this is often evaluated via Simulated Prosthetic Vision (SPV) with sighted people. In this work, we introduce a novel SPV framework and implementation that presents major advantages with respect to previous approaches. First, it is integrated into a robotics framework, which allows us to benefit from a wide range of methods and algorithms from the field (e.g. object recognition, SLAM, obstacle avoidance, autonomous navigation, deep learning). Second, we go beyond traditional image processing with 3D point clouds processing using an RGB-D camera, allowing us to robustly detect the floor, obstacles and the structure of the scene. Third, it works either with a real camera or in a virtual environment, which gives us endless possibilities for immersive experimentation through a head-mounted display. Fourth, we incorporate a validated temporal phosphene model that replicates time effects into the generation of visual stimuli. Finally, we have proposed, developed and tested several applications within this framework, such as avoiding moving obstacles, providing a general understanding of the scene, staircase detection, helping the subject to navigate an unfamiliar space and reach a destination, and object and person detection. We provide experimental results in real and virtual environments. The code will be publicly available at www.github.com/aperezyus/RASPV.
... Because stairs are artificially constructed building structures with obvious geometric features, most stair detection methods rely on extracting some stair geometric features. For example, line-based extraction methods [1,2,3] abstract the stair geometric features as a set of lines continuously distributed in an image, and potential stair lines are extracted in RGB or depth images through Canny edge detection [4], Sobel filtering, Hough transform [5], etc. Plane-based extraction methods [6,7,8] abstract the stair geometric features as a set of parallel planes continuously distributed in space, and potential stair surfaces in point cloud data are extracted through algorithms such as random sample consensus (RANSAC) [9] and supervoxel clustering [10]. ...
Preprint
Vision-based stair perception can help autonomous mobile robots deal with the challenge of climbing stairs, especially in unfamiliar environments. To address the problem that current monocular vision methods are difficult to model stairs accurately without depth information, this paper proposes a depth-aware stair modeling method for monocular vision. Specifically, we take the extraction of stair geometric features and the prediction of depth images as joint tasks in a convolutional neural network (CNN), with the designed information propagation architecture, we can achieve effective supervision for stair geometric feature learning by depth information. In addition, to complete the stair modeling, we take the convex lines, concave lines, tread surfaces and riser surfaces as stair geometric features and apply Gaussian kernels to enable the network to predict contextual information within the stair lines. Combined with the depth information obtained by depth sensors, we propose a stair point cloud reconstruction method that can quickly get point clouds belonging to the stair step surfaces. Experiments on our dataset show that our method has a significant improvement over the previous best monocular vision method, with an intersection over union (IOU) increase of 3.4 %, and the lightweight version has a fast detection speed and can meet the requirements of most real-time applications. Our dataset is available at https://data.mendeley.com/datasets/6kffmjt7g2/1.
... Point cloud segmentation is a common plane extraction method, and many methods have been developed for stair feature matching. Classifying planes by obtaining their normal vector and eliminating the planes that do not belong to the stairs is a common method 20,21 . Sinha et al. 22 present an algorithm for stair detection from point clouds based on a new minimal 3D map representation and the estimation of step-like features that are grouped based on adjacency in order to emerge dominant staircase structures. ...
Article
Full-text available
Staircases are some of the most common building structures in urban environments. Stair detection is an important task for various applications, including the environmental perception of exoskeleton robots, humanoid robots, and rescue robots and the navigation of visually impaired people. Most existing stair detection algorithms have difficulty dealing with the diversity of stair structure materials, extreme light and serious occlusion. Inspired by human perception, we propose an end-to-end method based on deep learning. Specifically, we treat the process of stair line detection as a multitask involving coarse-grained semantic segmentation and object detection. The input images are divided into cells, and a simple neural network is used to judge whether each cell contains stair lines. For cells containing stair lines, the locations of the stair lines relative to each cell are regressed. Extensive experiments on our dataset show that our method can achieve 81.49% accuracy, 81.91% recall and 12.48 ms runtime, and our method has higher performance in terms of both speed and accuracy than previous methods. A lightweight version can even achieve 300+ frames per second with the same resolution.
... The obtained class is coded and mapped into a braille display. In [7], they proposed a system that uses Visual Odometry, Region-growing and Euclidean cluster extraction, and depth data to determine if the horizontal planes are a valid step of a staircase. In [8], they implemented scene understanding using deep learning techniques. ...
Article
Full-text available
The aim of this work is to provide a semantic scene synthesis from a single depth image. This is used in assistive aid systems for visually impaired and blind people that allows them to understand their surroundings by the touch sense. The fact that blind people use touch to recognize objects and rely on listening to replace sight motivated us to propose this work. First, the acquired depth image is segmented and each segment is classified in the context of assistive systems using a deep learning network. Second, inspired by the Braille system and the Japanese writing system Kanji, the obtained classes are coded with semantic labels. The scene is then synthesized using these labels and the extracted geometric features. Our system is able to predict more than 17 classes only by understanding the provided illustrative labels. For the remaining objects, their geometric features are transmitted. The labels and the geometric features are mapped on a synthesis area to be sensed by the touch sense. Experiments are conducted on noisy and incomplete data including acquired depth images of indoor scenes and public datasets. The obtained results are reported and discussed.
... A segmentation on point cloud offers more holistic recognition. Some classical point cloud segmentation works focus on designing principled algorithms for walkable area detection [WKT + 17, ZLMAEB19] or stairs navigation [PGLG17,YWCZ15]. Semantic instance segmentation has the potential to enable a global scene understanding. ...
Thesis
Full-text available
Independently exploring unknown spaces or finding objects in an indoor environment is a daily but challenging task for visually impaired people. The previous assistive systems lack depth relationships between various objects, resulting in difficulty to obtain accurate spatial layout and relative positions of objects. Semantic segmentation enables a complete understanding of the surrounding environment. By combining semantic and position information, a high-level indoor scene understanding is possible. In this work, an assistive system based on semantic segmentation is proposed. This entire system consists of three hardware components and two interactive assistive modes. The first mode is designed for holistic indoor detection and avoidance. Based on voice guidance, the point cloud from the most recent state of the changing indoor environment is captured through on-site scanning performed by the user. A point cloud segmentation model is applied in this mode to generate the 3D semantic instance map. After this 3D instance segmentation, the system integrates the information above and interacts with users intuitively by acoustic feedback. The second mode is RGB-Depth semantic segmentation based. A two-stream multi-modal segmentation framework is proposed, which uses the Feature Rectification Module (FRM ) to bi-directionally enhance the current modal feature. For the feature pairs extracted from the two branches, a Feature Fusion Module (FFM ) is applied to merge them for semantic prediction. Image segmentation is much faster than point cloud segmentation, therefore this mode aims for real-time scene perception on walkable areas and obstacles. An estimated obstacle distance is calculated by combining semantic prediction with the corresponding depth map. These two complementary modes realize a high-level perception for visually impaired people. The proposed 3D instance segmentation model and 2D RGB-Depth semantic segmentation model have achieved leading performance on multiple datasets. Comprehensive field tests with various tasks in a user study verify the usability and effectiveness of this system for assisting visually impaired people in indoor scene understanding.
... Point cloud segmentation is a common plane extraction method, and many methods have been developed for stair feature matching. Classifying planes by obtaining their normal vector and eliminating the planes that do not belong to the stairs is a common method [25,26]. [27] presents an algorithm for stair detection from point clouds based on a new minimal 3D map representation and the estimation of step-like features that are grouped based on adjacency in order to emerge dominant staircase structures. ...
Preprint
Full-text available
Staircases are some of the most common building structures in urban environments. Stair detection is an important task for various applications, including the environmental perception of exoskeleton robots, humanoid robots, and rescue robots and the navigation of visually impaired people. Most existing stair detection algorithms have difficulty dealing with the diversity of stair structure materials, extreme light and serious occlusion. Inspired by human perception, we propose an end-to-end method based on deep learning. Specifically, we treat the process of stair line detection as a multitask involving coarse-grained semantic segmentation and object detection. The input images are divided into cells, and a simple neural network is used to judge whether each cell contains stair lines. For cells containing stair lines, the locations of the stair lines relative to each cell are regressed. Extensive experiments on our dataset show that our method can achieve high performance in terms of both speed and accuracy. A lightweight version can even achieve 300+ frames per second with the same resolution. Our code is available at GitHub.
... It provides a detection of the locomotion terrain(s) , i.e. flat ground, stairs, and/or ramp, as output to reduce computational load, since not all points bring a significant part of new information. Then, for each point, a normal vector is estimated from its neighboring points based on a Principal Component Analysis (PCA) [2]. After that, the point cloud data with their normals are passed to a clustering and plane fitting step. ...
... The point cloud data is clustered into several groups using a regional growing algorithm. It starts from a point with minimum curvature called seed, and expands the region to include neighboring points having quasi-parallel normals and similar curvature [2]. Next, a plane is fitted over each cluster using the RANdom SAmple Consensus (RANSAC) algorithm [3]. ...
... The feature of the identified planes can then be rotated in the inertial frame. This approach is expected to be less computationally greedy than more classical transformations rotating the whole point cloud before feature extraction [2], [4], [5]. Finally, a classification tree is used for terrain recognition. ...
Chapter
Vision based systems for terrain detection play important roles in mobile robotics, and recently such systems emerged for locomotion assistance of disabled people. For instance, they can be used as wearable devices to assist blind people or to guide prosthesis or exoskeleton controller to retrieve gait patterns being adapted to the executed task (overground walking, stairs, slopes, etc.). In this paper, we present a computer vision-based algorithm achieving the detection of flat ground, steps, and ramps using a depth camera. Starting from point cloud data collected by the camera, it classifies the environment as a function of extracted features. We further provide a pilot validation in an indoor environment containing a rich set of different types of terrains, even with partial occlusion, and observed that the overall system accuracy is above 94\(\%\). The paper further shows that our system needs less computational resources than recently published concurrent approaches, owing to the original transformation method we developed.
... Compared to 2D segmentation-driven assistance, 3D scene parsing systems [6,58,62] fall behind, as these classical point cloud segmentation works focus on designing principled algorithms for walkable area detection [1,48,62] or stairs navigation [44,58,59]. In this work, we devise a 3D semantic instance segmentation system for helping visually impaired people perceive the entire surrounding and provide a top-view understanding, which is critical for various indoor travelling and mapping tasks [20,28,29,33]. ...
... It computes the eigenvalue and eigenvector of a covariance matrix created from direct neighboring points. The normal direction is estimated as the eigenvector having the smallest eigenvalue [13]. • The point cloud data is then clustered into several groups using a regional growing algorithm. ...
... This algorithm outputs a set of points (cluster) belonging to the same smooth surface. The regional growing algorithm has a good physical relevance since the planes are detected from a closed region depending on single element (seed) rather than from set of uncorrelated points distributed in the scene [12], [13]. • Finally, a plane is fitted over each cluster using the RANdom SAmple Consensus (RANSAC) algorithm developed by [18]. ...
... This orientation can be provided e.g. by an IMU that is attached to the camera. Typically, this transformation is applied on the whole point cloud just after acquisition, and thus before feature extraction [6], [7], [13]. Here, we applied the transformation on the obtained plane features (centroid and slope), since it is not necessary to rotate and translate all points of the acquired cloud. ...