ArticlePDF Available

Robot Vision Architecture for Autonomous Clothes Manipulation

October 2016

October 2016

Authors:

Kevin Li Sun

University of Oxford

Simon Rogers

University of Glasgow

Jan Paul Siebert

University of Glasgow

This paper presents a novel robot vision architecture for perceiving generic 3D clothes configurations. Our architecture is hierarchically structured, starting from low-level curvatures, across mid-level geometric shapes \& topology descriptions; and finally approaching high-level semantic surface structure descriptions. We demonstrate our robot vision architecture in a customised dual-arm industrial robot with our self-designed, off-the-self stereo vision system, carrying out autonomous grasping and dual-arm flattening. It is worth noting that the proposed dual-arm flattening approach is unique among the state-of-the-art robot autonomous system, which is the major contribution of this paper. The experimental results show that the proposed dual-arm flattening using stereo vision system remarkably outperforms the single-arm flattening and widely-cited Kinect-based sensing system for dexterous manipulation tasks. In addition, the proposed grasping approach achieves satisfactory performance on grasping various kind of garments, verifying the capability of proposed visual perception architecture to be adapted to more than one clothing manipulation tasks.

The hierarchical visual architecture for visually-guided clothes manipulation.

…

(A) The CloPeMa robot which consists of two six degrees of freedom YASKAWA arms and a custom made YASKAWA turn-table. Each arm features a specialised gripper for handling clothing and an ASUS Xtion Pro attached to the wrist of the arm. (B) Our stereo robot head integrated on our dual-arm robot. (C) A close up of the robot's gripper.

…

The comparison between depth data produced by kinect-like camera and stereo head.

…

(A) The CloPeMa robot which consists of two seven degrees of freedom YASKAWA arms and a custom made YASKAWA turn-table. Each arm features a specialised gripper for handling clothing and a ASUS Xtion Pro. (B) Our stereo robot head instagrated on the CloPeMa testbed suit. (C) A close up of the CloPeMa gripper.

…

The whole pipeline for autonomous grasping and flattening.

…

Figures - uploaded by Kevin Li Sun

Content may be subject to copyright.

Content uploaded by Kevin Li Sun

Content may be subject to copyright.

Content uploaded by Kevin Li Sun

Content may be subject to copyright.

Content uploaded by Kevin Li Sun

Content may be subject to copyright.

A preview of the PDF is not available

Perception of cloth in assistive robotic manipulation tasks

Article

Full-text available

Jun 2020
Nat Comput

Assistive robots need to be able to perform a large number of tasks that imply some type of cloth manipulation. These tasks include domestic chores such as laundry handling or bed-making, among others, as well as dressing assistance to disabled users. Due to the deformable nature of fabrics, this manipulation requires a strong perceptual feedback. Common perceptual skills that enable robots to complete their cloth manipulation tasks are reviewed here, mainly relying on vision, but also resorting to touch and force. The use of such basic skills is then examined in the context of the different cloth manipulation tasks, be them garment-only applications in the line of performing domestic chores, or involving physical contact with a human as in dressing assistance.

Dual Edge Classifer for Robust Cloth Unfolding

Preprint

Full-text available

Jan 2021

Compared with more rigid objects, clothing items are inherently difficult for robots to recognize and manipulate. We propose a method for detecting how cloth is folded, to facilitate choosing a manipulative action that corresponds to a garment's shape and position. The proposed method involves classifying the edges and corners of a garment by distinguishing between edges formed by folds and the hem or ragged edge of the cloth. Identifying the type of edges in a corner helps to determinate how the object is folded. This bottom-up approach, together with an active perception system, allows us to select strategies for robotic manipulation. We corroborate the method using a two-armed robot to manipulate towels of different shapes, textures, and sizes.

Markerless Vision-Based One Cardboard Box Grasping using Dual Arm Robot

Article

Full-text available

Aug 2020
MULTIMED TOOLS APPL

Nowadays, robots are indispensable in industry, especially logistics industry, to replace human employees performing heavy lifting tasks. Introducing robots prevents musculoskeletal disorders that are common in ageing workforce. We designed and implemented a dual arm robot to grasp cardboard boxes of different dimensions using a hydrid force/position control. In a first step, the position of the cardboard was estimated using markers and ARtags, and an integrated camera. However this solution showed some limitation, because it is not possible to place ARtag on every cardboard of a logistic warehouse. In this paper, we propose a new method to estimate one cardboard box position based on vision without the need of markers at all. Our method explores the advantages of the RGBD integrated camera through the use of strong features and perspective geometry. Our method is very adequate to the case of one cardboard box due to the simplicity of its geometric shape. The experiments show that our method is fast, robust and precise, and of course is better suited to the logistics warehouse environment than the marker estimation procedure for palletization applications.

On modelling and handling of flexible materials: A review on Digital Twins and planning systems

Article

Full-text available

Jan 2021

In this paper a series of studies dealing with flexible material manipulation in aspects of manipulation, modelling and scheduling are discussed. The main purpose of this work is to provide an overview of the existing technologies and their capabilities both in manufacturing and academia, that can be elaborated in autonomous flexible material handling using robotics. The particularities of flexible material handling require advanced control systems for simulating, monitoring and managing the deformation of plies. A simulation model for predicting and defining the status of manipulated fabrics is proposed. Digital representation of the production system, in the basis of Digital Twin, is intended for achieving real-time adaptation. A pioneer control and planning system, interconnected to the digital model, is proposed for orchestrating the manipulation process. Current limitations of the existing technologies in flexible material handling and modelling are outlined and discussed, towards the implementation of a Workcell controller for flexible material manipulation robotic cell.

DeepIM: Deep Iterative Matching for 6D Pose Estimation

Article

Full-text available

Mar 2020
INT J COMPUT VISION

Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the observed image can produce accurate results. In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. Given an initial pose estimation, our network is able to iteratively refine the pose by matching the rendered image against the observed image. The network is trained to predict a relative pose transformation using an untangled representation of 3D location and 3D orientation and an iterative training process. Experiments on two commonly used benchmarks for 6D pose estimation demonstrate that DeepIM achieves large improvements over state-of-the-art methods. We furthermore show that DeepIM is able to match previously unseen objects.

Deep Learning for Detecting Robotic Grasps

Article

Full-text available

Jan 2013

We consider the problem of detecting robotic grasps in an RGB-D view of a scene containing objects. In this work, we apply a deep learning approach to solve this problem, which avoids time-consuming hand-design of features. This presents two main challenges. First, we need to evaluate a huge number of candidate grasps. In order to make detection fast, as well as robust, we present a two-step cascaded structure with two deep networks, where the top detections from the first are re-evaluated by the second. The first network has fewer features, is faster to run, and can effectively prune out unlikely candidate grasps. The second, with more features, is slower but has to run only on the top few detections. Second, we need to handle multimodal inputs well, for which we present a method to apply structured regularization on the weights based on multimodal group regularization. We demonstrate that our method outperforms the previous state-of-the-art methods in robotic grasp detection, and can be used to successfully execute grasps on a Baxter robot.

Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clothes Sorting

Conference Paper

Full-text available

Jul 2017

In this paper, we propose a single-shot approach to recognise clothing categories from 2.5D features. We propose two visual features, BSP (B-Spline Patch) and TSD (Topology Spatial Distances) for this task. The local BSP features are encoded by LLC (Locality-constrained Linear Coding) technique and fused with three different global features. Our visual feature is robust to the deformable shape of the clothing and the proposed approach is able to recognise the category of unknown clothing in unconstrained and random configurations. We integrated the category recognition pipeline with stereo vision system, clothing instance detection, and dual-arm manipulators to achieve an autonomous sorting system. To verify the performance of our proposed method, we build a high-resolution RGBD clothing dataset of 50 clothing items of 5 categories sampled in random configurations (a total of 2100 clothing samples). Experimental results show that our approach is able to reach 83.2% accuracy while classifying unseen clothing items, which advances the state-of-the-art by 36.2%. Finally, we evaluate the proposed approach in an autonomous robot sorting system, in which the robot recognises a clothing item from an unconstrained pile, grasps it, and sorts it into a box according to its category. Our proposed sorting system achieves reasonable sorting success rate with single-shot perception.

Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

Article

Full-text available

Mar 2017

To reduce data collection time for deep learning of robust robotic grasp plans, we explore training from a synthetic dataset of 6.7 million point clouds, grasps, and robust analytic grasp metrics generated from thousands of 3D models from Dex-Net 1.0 in randomized poses on a table. We use the resulting dataset, Dex-Net 2.0, to train a Grasp Quality Convolutional Neural Network (GQ-CNN) model that rapidly classifies grasps as robust from depth images and the position, angle, and height of the gripper above a table. Experiments with over 1,000 trials on an ABB YuMi comparing grasp planning methods on singulated objects suggest that a GQ-CNN trained with only synthetic data from Dex-Net 2.0 can be used to plan grasps in 0.8s with a success rate of 93% on eight known objects with adversarial geometry and is 3x faster than registering point clouds to a precomputed dataset of objects and indexing grasps. The GQ-CNN is also the highest performing method on a dataset of ten novel household objects, with zero false positives out of 29 grasps classified as robust and a 1.5x higher success rate than a point cloud registration method.

Folding Clothes Autonomously: A Complete Pipeline

Article

Full-text available

Oct 2016

This work presents a complete pipeline for folding a pile of clothes using a dual-armed robot. This is a challenging task both from the viewpoint of machine vision and robotic manipulation. The presented pipeline is comprised of the following parts: isolating and picking up a single garment from a pile of crumpled garments, recognizing its category, unfolding the garment using a series of manipulations performed in the air, placing the garment roughly flat on a work table, spreading it, and, finally, folding it in several steps. The pile is segmented into separate garments using color and texture information, and the ideal grasping point is selected based on the features computed from a depth map. The recognition and unfolding of the hanging garment are performed in an active manner, utilizing the framework of active random forests to detect grasp points, while optimizing the robot actions. The spreading procedure is based on the detection of deformations of the garment’s contour. The perception for folding employs fitting of polygonal models to the contour of the observed garment, both spread and already partially folded. We have conducted several experiments on the full pipeline producing very promising results. To our knowledge, this is the first work addressing the complete unfolding and folding pipeline on a variety of garments, including T-shirts, towels, and shorts.

Multi-Sensor Surface Analysis for Robotic Ironing

Article

Full-text available

Feb 2016

Robotic manipulation of deformable objects remains a challenging task. One such task is to iron a piece of cloth autonomously. Given a roughly flattened cloth, the goal is to have an ironing plan that can iteratively apply a regular iron to remove all the major wrinkles by a robot. We present a novel solution to analyze the cloth surface by fusing two surface scan techniques: a curvature scan and a discontinuity scan. The curvature scan can estimate the height deviation of the cloth surface, while the discontinuity scan can effectively detect sharp surface features, such as wrinkles. We use this information to detect the regions that need to be pulled and extended before ironing, and the other regions where we want to detect wrinkles and apply ironing to remove the wrinkles. We demonstrate that our hybrid scan technique is able to capture and classify wrinkles over the surface robustly. Given detected wrinkles, we enable a robot to iron them using shape features. Experimental results show that using our wrinkle analysis algorithm, our robot is able to iron the cloth surface and effectively remove the wrinkles.

Dex-Net 3.0: Computing Robust Vacuum Suction Grasp Targets in Point Clouds Using a New Analytic Model and Deep Learning

Conference Paper

May 2018

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

Article

Nov 2017

Estimating the 6D pose of known objects is important for robots to interact with objects in the real world. The problem is challenging due to the variety of objects as well as the complexity of the scene caused by clutter and occlusion between objects. In this work, we introduce a new Convolutional Neural Network (CNN) for 6D object pose estimation named PoseCNN. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. PoseCNN is able to handle symmetric objects and is also robust to occlusion between objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN provides very good estimates using only color as input.

Folding deformable objects using predictive simulation and trajectory optimization

Conference Paper

Sep 2015

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

Article

Mar 2016

We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independently of camera calibration or the current robot pose. This requires the network to observe the spatial relationship between the gripper and objects in the scene, thus learning hand-eye coordination. We then use this network to servo the gripper in real time to achieve successful grasps. To train our network, we collected over 800,000 grasp attempts over the course of two months, using between 6 and 14 robotic manipulators at any given time, with differences in camera placement and hardware. Our experimental evaluation demonstrates that our method achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing.

Robot Vision Architecture for Autonomous Clothes Manipulation

Abstract and Figures

Recommended publications

Scotland’s most sustainable university

Adam Smith 300 Year Anniversary – Global Reading Group Events

Adam Smith 300 Year Anniversary – Global Reading Group Events

The future with quantum

Autonomous Clothes Manipulation Using a Hierarchical Vision Architecture

Accurate Garment Surface Analysis using an Active Stereo Robot Head with Application to Dual-Arm Fla...

Recognising the clothing categories from free-configuration using Gaussian-Process-based interactive...

Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clot...