Major challenges in DL for robotics.

Source publication

Deep Learning in Robotics: Survey on Model Structures and Training Strategies

Article

Full-text available

Jun 2020

The ever-increasing complexity of robot applications induces the need for methods to approach problems with no (viable) analytical solution. Deep learning (DL) provides a set of tools to address this kind of problems. This survey presents a categorization of the major challenges in robotics that leverage DL technologies and introduces representativ...

Context 1

... instead of reality [30], [31]. In this survey, we provide a wide angle review, from the aspect of the utilized structures and training strategies for DL models in robotics. We elaborate this topic by considering the major challenges in robotics and the related DL solutions. Accordingly, a structured overview of the problems can be seen in Fig. 1 arranging the landscape along the three major categories: 1) perception; 2) motion; and 3) knowledge adaptation. This categorization is the result of the following ...

View in full-text

Context 2

... the obtained knowledge can be directly utilized in other problems. To avoid or at least reduce the tremendous effort of training from scratch, the general knowledge has to be extracted from the experience of former solutions and transferred to new ones. Along these points, the three major challenges can be organized in a hierarchical structure (Fig. 1), because the motion or manipulation is carried out based on the results of the perception process, and for the adaptation of the extracted knowledge an already existing solution for a similar problem is necessary. The category of motion is further divided both vertically and horizontally. The vertical separation is due to the ...

View in full-text

Context 3

... set of tools provided by DL can be applied at any level of the hierarchy shown in Fig. 1. Furthermore, the scope of a DL application can range from a single subtask to the whole hierarchy. When approaching a robot application with a DL solution, one has to decide which part of the application the DL model should be responsible for, and what strategies can be used for its training. In this survey, we aim to support these ...

View in full-text

Context 4

... consider the challenges of modern robot applications that usually benefit from DL solutions. The challenges are organized according to the scheme of Fig. 1, which helps us present the modular aspect of such ...

View in full-text

Context 5

... of the recent learning-based solutions for motion control utilize a so-called end-to-end approach [26], [78]- [82]. Such an approach is based on a DL solution that integrates the whole dataflow into a single computational model. According to Fig. 1 an end-to-end model would take sensory data as its input, and output the motor control signals needed to perform the actions (gray dashed line). In a model like this, it is not possible to separate the different levels, like perception and motion planning. Also, the model includes the solution for problems, that could be solved with ...

View in full-text

Context 6

... a time consuming and data demanding process. If a robot application has a requirement, that is satisfied with a DL-based solution, it should be crafted in a way, that is easy to adapt to other similar problems as well. That is, why the adaptability of the model always have to be considered, and thus, it is represented in the background layer in Fig. 1. The area of transfer learning deals with approaches that leverage the knowledge gained by the solution of former problems in order to speed up the training process, or to enhance the performance of new solutions ...

View in full-text

Context 7

... approaches aim to discover features that are domain independent [14], [95]. The domain independent features can be extracted from several different domains (D 1 , D 2 , . . . , ). ...

View in full-text

Context 8

... }, respectively. x 1i and x 2j are common features, if they are identical and their marginal probability distribution in the two domains (D 1 , D 2 ) are the same. If we take a dataset of images of objects to be grasped as an example, identical features can be the red color intensity value of the same pixels of the images of different domains (if the images have the same resolution). ...

View in full-text

Context 9

... heavily influences the adaptability of the solution. We can highlight the advantages and disadvantages of different approaches from this aspect. We especially focus on the inhierarchy scope of some approaches, along with their training strategies, and argue that larger DL models that incorporate multiple levels of the hierarchy presented in Fig. 1, such as end-to-end methods, are not always preferable over modular DL-based solutions. The use of transfer learning methods and the leveraging of the modular nature of DL models can help in the quick and data efficient preparation of new solutions. We support our argument with a set of relevant examples and best practices and ...

View in full-text

Context 10

... examples are categorized according to the approaches for achieving such efficient adaptation capabilities, meanwhile the order of the presented examples roughly represents moving higher in the hierarchy in Fig. ...

View in full-text

Context 11

... Feature Extraction for Planning: The examples presented so-far can be considered as solutions for the perception problem. In case of grasp planning, however, the predictions may also depend on the kinematic model of the end effector, resulting in different strategies for different kinds of end effectors. Thus, according to our categorization in Fig. 1, such challenges are at a higher level in the hierarchy (motion planning). The following examples, that incorporate this knowledge inside the DL model provide solutions for planning ...

View in full-text

Context 12

... Specify the requirements that the application should fulfil and check if it could benefit from a DL-based solution (this could be done according to Section II). 2) Identify the specific task that the DL-model should be responsible for and locate it in the hierarchy of Fig. ...

View in full-text

Context 13

... III or elsewhere. 4) Decide on the model structure (monolithic or modular) based-on the scope of the problem (Fig. 1) and the available resources. 5) Carry out a training strategy for the model. If possible use pretrained models, feature extractors and/or transfer learning, and the incremental end-to-end training approach. If the available training dataset is not large enough for the end-to-end fine-tuning, then use the section-wise training strategy. ...

View in full-text

Solution Representation Learning in Multi-Objective Transfer Evolutionary Optimization

Article

Full-text available

Mar 2021

This paper presents a first study on solution representation learning for inducing greater alignment and hence positive transfers between distinct multi-objective optimization tasks that bear discrepancies in their original search spaces. We first establish a novel probabilistic model-based multiobjective transfer evolutionary optimization (TrEO) f...

[TSMC] Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework

Preprint

Full-text available

Dec 2023

Despite the promising results achieved, state-of-the-art interactive reinforcement learning schemes rely on passively receiving supervision signals from advisor experts, in the form of either continuous monitoring or predefined rules, which inevitably result in a cumbersome and expensive learning process. In this article, we introduce a novel init...

FIGURE 1. Organization of the paper. Paper flow is from top to bottom...

FIGURE 6. Domain-Adversarial Neural Network (DANN) trains two network...

FIGURE 7. Typical network structure/architecture of deep neural...

FIGURE 8. Typical self-supervision network structure. It is a...

FIGURE 10. Typical Pseudo-semi-supervised DA strategy. A subset of...

Domain Adaptation: Challenges, Methods, Datasets, and Applications

Article

Full-text available

Jan 2023

Deep Neural Networks (DNNs) trained on one dataset (source domain) do not perform well on another set of data (target domain), which is different but has similar properties as the source domain. Domain Adaptation (DA) strives to alleviate this problem and has great potential in its application in practical settings, real-world scenarios, industrial...

Experimental results on the GLUE benchmark, based on BERT-base. M.-MM =...

Experimental results on GLUE benchmark, based on BERT-Large.

Experimental results on dense retrievals, with COIL framework, using...

Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure

Article

Full-text available

Jan 2024

Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage a...

FIGURE 3. Schematic diagram of the proxy loss function. The solid blue...

FIGURE 4. Comparison of average accuracy on 11-split CIFAR-10/100...

FIGURE 5. Comparison of average accuracy on 10-split mini-ImageNet...

FIGURE 6. Correlation of model structure of sample images belonging to...

Incremental Learning With Adaptive Model Search and a Nominal Loss Model

Article

Full-text available

Jan 2022

This paper addresses an incremental learning problem, in which tasks are learned sequentially without access to the previously trained dataset. Catastrophic forgetting is a significant bottleneck to incremental learning as the network performs poorly on previous tasks when it is trained on a new task. We propose an adaptive model search method that...

HiDeS: a higher-order-derivative-supervised neural ordinary differential equation for multi-robot systems and opinion dynamics

Article

Full-text available

Mar 2024

This paper addresses the limitations of current neural ordinary differential equations (NODEs) in modeling and predicting complex dynamics by introducing a novel framework called higher-order-derivative-supervised (HiDeS) NODE. This method extends traditional NODE frameworks by incorporating higher-order derivatives and their interactions into the modeling process, thereby enabling the capture of intricate system behaviors. In addition, the HiDeS NODE employs both the state vector and its higher-order derivatives as supervised signals, which is different from conventional NODEs that utilize only the state vector as a supervised signal. This approach is designed to enhance the predicting capability of NODEs. Through extensive experiments in the complex fields of multi-robot systems and opinion dynamics, the HiDeS NODE demonstrates improved modeling and predicting capabilities over existing models. This research not only proposes an expressive and predictive framework for dynamic systems but also marks the first application of NODEs to the fields of multi-robot systems and opinion dynamics, suggesting broad potential for future interdisciplinary work. The code is available at https://github.com/MengLi-Thea/HiDeS-A-Higher-Order-Derivative-Supervised-Neural-Ordinary-Differential-Equation.

Strategies and Outcomes of Building a Successful University Research and Innovation Ecosystem

Article

Jan 2024

Optimal Configuration of Multi-Task Learning for Autonomous Driving

Article

Full-text available

Dec 2023
SENSORS-BASEL

For autonomous driving, it is imperative to perform various high-computation image recognition tasks with high accuracy, utilizing diverse sensors to perceive the surrounding environment. Specifically, cameras are used to perform lane detection, object detection, and segmentation, and, in the absence of lidar, tasks extend to inferring 3D information through depth estimation, 3D object detection, 3D reconstruction, and SLAM. However, accurately processing all these image recognition operations in real-time for autonomous driving under constrained hardware conditions is practically unfeasible. In this study, considering the characteristics of image recognition tasks performed by these sensors and the given hardware conditions, we investigated MTL (multi-task learning), which enables parallel execution of various image recognition tasks to maximize their processing speed, accuracy, and memory efficiency. Particularly, this study analyzes the combinations of image recognition tasks for autonomous driving and proposes the MDO (multi-task decision and optimization) algorithm, consisting of three steps, as a means for optimization. In the initial step, a MTS (multi-task set) is selected to minimize overall latency while meeting minimum accuracy requirements. Subsequently, additional training of the shared backbone and individual subnets is conducted to enhance accuracy with the predefined MTS. Finally, both the shared backbone and each subnet undergo compression while maintaining the already secured accuracy and latency performance. The experimental results indicate that integrated accuracy performance is critically important in the configuration and optimization of MTL, and this integrated accuracy is determined by the ITC (inter-task correlation). The MDO algorithm was designed to consider these characteristics and construct multi-task sets with tasks that exhibit high ITC. Furthermore, the implementation of the proposed MDO algorithm, coupled with additional SSL (semi-supervised learning) based training, resulted in a significant performance enhancement. This advancement manifested as approximately a 12% increase in object detection mAP performance, a 15% improvement in lane detection accuracy, and a 27% reduction in latency, surpassing the results of previous three-task learning techniques like YOLOP and HybridNet.

Bio-inspired circular latent spaces to estimate objects' rotations

Article

Full-text available

Nov 2023

This paper proposes a neural network model that estimates the rotation angle of unknown objects from RGB images using an approach inspired by biological neural circuits. The proposed model embeds the understanding of rotational transformations into its architecture, in a way inspired by how rotation is represented in the ellipsoid body of Drosophila . To effectively capture the cyclic nature of rotation, the network's latent space is structured in a circular manner. The rotation operator acts as a shift in the circular latent space's units, establishing a direct correspondence between shifts in the latent space and angular rotations of the object in the world space. Our model accurately estimates the difference in rotation between two views of an object, even for categories of objects that it has never seen before. In addition, our model outperforms three state-of-the-art convolutional networks commonly used as the backbone for vision-based models in robotics.

Large Language Models for Robotics: A Survey

Preprint

Full-text available

Nov 2023

The human ability to learn, generalize, and control complex manipulation tasks through multi-modality feedback suggests a unique capability, which we refer to as dexterity intelligence. Understanding and assessing this intelligence is a complex task. Amidst the swift progress and extensive proliferation of large language models (LLMs), their applications in the field of robotics have garnered increasing attention. LLMs possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots. Researchers and engineers in the field of robotics have recognized the immense potential of LLMs in enhancing robot intelligence, human-robot interaction, and autonomy. Therefore, this comprehensive review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and path planning. We first provide an overview of the background and development of LLMs for robotics, followed by a description of the benefits of LLMs for robotics and recent advancements in robotics models based on LLMs. We then delve into the various techniques used in the model, including those employed in perception, decision-making, control, and interaction. Finally, we explore the applications of LLMs in robotics and some potential challenges they may face in the near future. Embodied intelligence is the future of intelligent science, and LLMs-based robotics is one of the promising but challenging paths to achieve this.

Using N-BEATS ensembles to predict automated guided vehicle deviation

Article

Full-text available

Aug 2023
APPL INTELL

A novel AGV (Automated Guided Vehicle) control architecture has recently been proposed where the AGV is controlled remotely by a virtual Programmable Logic Controller (PLC), which is deployed on a Multi-access Edge Computing (MEC) platform and connected to the AGV via a radio link in a 5G network. In this scenario, we leverage advanced deep learning techniques based on ensembles of N-BEATS (state-of-the-art in time-series forecasting) to build predictive models that can anticipate the deviation of the AGV’s trajectory even when network perturbations appear. Therefore, corrective maneuvers, such as stopping the AGV, can be performed in advance to avoid potentially harmful situations. The main contribution of this work is an innovative application of the N-BEATS architecture for AGV deviation prediction using sequence-to-sequence modeling. This novel approach allows for a flexible adaptation of the forecast horizon to the AGV operator’s current needs, without the need for model retraining or sacrificing performance. As a second contribution, we extend the N-BEATS architecture to incorporate relevant information from exogenous variables alongside endogenous variables. This joint consideration enables more accurate predictions and enhances the model’s overall performance. The proposed solution was thoroughly evaluated through realistic scenarios in a real factory environment with 5G connectivity and compared against main representatives of deep learning architectures (LSTM), machine learning techniques (Random Forest), and statistical methods (ARIMA) for time-series forecasting. We demonstrate that the deviation of AGVs can be effectively detected by using ensembles of our extended N-BEATS architecture that clearly outperform the other methods. Finally, a careful analysis of a real-time deployment of our solution was conducted, including retraining scenarios that could be triggered by the appearance of data drift problems.

Zespol: A Lightweight Environment for Training Swarming Agents

Preprint

Full-text available

Jun 2023

Agent-based modeling (ABM) and simulation have emerged as important tools for studying emergent behaviors, especially in the context of swarming algorithms for robotic systems. Despite significant research in this area, there is a lack of standardized simulation environments, which hinders the development and deployment of real-world robotic swarms. To address this issue, we present Zespol, a modular, Python-based simulation environment that enables the development and testing of multi-agent control algorithms. Zespol provides a flexible and extensible sandbox for initial research, with the potential for scaling to real-world applications. We provide a topological overview of the system and detailed descriptions of its plug-and-play elements. We demonstrate the fidelity of Zespol in simulated and real-word robotics by replicating existing works highlighting the simulation to real gap with the milling behavior. We plan to leverage Zespol's plug-and-play feature for neuromorphic computing in swarming scenarios, which involves using the modules in Zespol to simulate the behavior of neurons and their connections as synapses. This will enable optimizing and studying the emergent behavior of swarm systems in complex environments. Our goal is to gain a better understanding of the interplay between environmental factors and neural-like computations in swarming systems.

Internet of robotic things for mobile robots: Concepts, technologies, challenges, applications, and future directions

Article

Full-text available

May 2023

Mau-Luen Tham

Digital Twin (DT)-CycleGAN: Enabling Zero-Shot Sim-to-Real Transfer of Visual Grasping Models

Article

Full-text available

May 2023

Deep learning has revolutionized the field of robotics. To deal with the lack of annotated training samples for learning deep models in robotics, Sim-to-Real transfer has been invented and widely used. However, such deep models trained in simulation environment typically do not transfer very well to the real world due to the challenging problem of “reality gap”. In response, this paper presents a conceptually new Digital Twin (DT)-CycleGAN framework by integrating the advantages of both DT methodology and the CycleGAN model so that the reality gap can be effectively bridged. Our core innovation is that real and virtual DT robots are forced to mimic each other in a way that the gaps or differences between simulated and realistic robotic behaviors are minimized. To effectively realize this innovation, visual grasping is employed as an exemplar robotic task, and the reality gap in zero-shot Sim-to-Real transfer of visual grasping models is defined as grasping action consistency losses and intrinsically penalized during the DT-CycleGAN training process in realistic simulation environments. Specifically, first, cycle consistency losses between real visual images and simulation images are defined and minimized to reduce the reality gaps in visual appearances during visual grasping tasks. Second, the grasping agent's action consistency losses are defined and penalized to minimize the inconsistency of the grasping agent's actions between the virtual states generated by the DT-CycleGAN generator and the real visual states. By jointly penalizing both the cycle consistency losses and the grasping agent's action consistency losses in DT-CycleGAN, the reality gaps in both visual appearance and grasping action states in simulated and real environments are minimized, thus significantly contributing to the effective and robust zero-shot Sim-to-Real transfer of the trained visual grasping models. Extensive experiments demonstrated the effectiveness and efficiency of our novel DT-CycleGAN framework for zero-shot Sim-to-Real transfer. Our code and models are released on GitHub to facilitate other researchers' works in this promising direction $^{4}$ .

Artificial Intelligence, Machine Learning and Deep Learning in Advanced Robotics, A Review

Article

Full-text available

Apr 2023

Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) have revolutionized the field of advanced robotics in recent years. AI, ML, and DL are transforming the field of advanced robotics, making robots more intelligent, efficient, and adaptable to complex tasks and environments. Some of the applications of AI, ML, and DL in advanced robotics include autonomous navigation, object recognition and manipulation, natural language processing, and predictive maintenance. These technologies are also being used in the development of collaborative robots (cobots) that can work alongside humans and adapt to changing environments and tasks. The AI, ML, and DL can be used in advanced transportation systems in order to provide safety, efficiency, and convenience to the passengers and transportation companies. Also, the AI, ML, and DL are playing a critical role in the advancement of manufacturing assembly robots, enabling them to work more efficiently, safely, and intelligently. Furthermore, they have a wide range of applications in aviation management, helping airlines to improve efficiency, reduce costs, and improve customer satisfaction. Moreover, the AI, ML, and DL can help taxi companies in order to provide better, more efficient, and safer services to customers. The research presents an overview of current developments in AI, ML, and DL in advanced robotics systems and discusses various applications of the systems in robot modification. Further research works regarding the applications of AI, ML, and DL in advanced robotics systems are also suggested in order to fill the gaps between the existing studies and published papers. By reviewing the applications of AI, ML, and DL in advanced robotics systems, it is possible to investigate and modify the performances of advanced robots in various applications in order to enhance productivity in advanced robotic industries.

Major challenges in DL for robotics.

Contexts in source publication

Similar publications

Citations