Fig 1 - uploaded by Péter Galambos
Content may be subject to copyright.
Major challenges in DL for robotics.

Major challenges in DL for robotics.

Source publication
Article
Full-text available
The ever-increasing complexity of robot applications induces the need for methods to approach problems with no (viable) analytical solution. Deep learning (DL) provides a set of tools to address this kind of problems. This survey presents a categorization of the major challenges in robotics that leverage DL technologies and introduces representativ...

Contexts in source publication

Context 1
... instead of reality [30], [31]. In this survey, we provide a wide angle review, from the aspect of the utilized structures and training strategies for DL models in robotics. We elaborate this topic by considering the major challenges in robotics and the related DL solutions. Accordingly, a structured overview of the problems can be seen in Fig. 1 arranging the landscape along the three major categories: 1) perception; 2) motion; and 3) knowledge adaptation. This categorization is the result of the following ...
Context 2
... the obtained knowledge can be directly utilized in other problems. To avoid or at least reduce the tremendous effort of training from scratch, the general knowledge has to be extracted from the experience of former solutions and transferred to new ones. Along these points, the three major challenges can be organized in a hierarchical structure (Fig. 1), because the motion or manipulation is carried out based on the results of the perception process, and for the adaptation of the extracted knowledge an already existing solution for a similar problem is necessary. The category of motion is further divided both vertically and horizontally. The vertical separation is due to the ...
Context 3
... set of tools provided by DL can be applied at any level of the hierarchy shown in Fig. 1. Furthermore, the scope of a DL application can range from a single subtask to the whole hierarchy. When approaching a robot application with a DL solution, one has to decide which part of the application the DL model should be responsible for, and what strategies can be used for its training. In this survey, we aim to support these ...
Context 4
... consider the challenges of modern robot applications that usually benefit from DL solutions. The challenges are organized according to the scheme of Fig. 1, which helps us present the modular aspect of such ...
Context 5
... of the recent learning-based solutions for motion control utilize a so-called end-to-end approach [26], [78]- [82]. Such an approach is based on a DL solution that integrates the whole dataflow into a single computational model. According to Fig. 1 an end-to-end model would take sensory data as its input, and output the motor control signals needed to perform the actions (gray dashed line). In a model like this, it is not possible to separate the different levels, like perception and motion planning. Also, the model includes the solution for problems, that could be solved with ...
Context 6
... a time consuming and data demanding process. If a robot application has a requirement, that is satisfied with a DL-based solution, it should be crafted in a way, that is easy to adapt to other similar problems as well. That is, why the adaptability of the model always have to be considered, and thus, it is represented in the background layer in Fig. 1. The area of transfer learning deals with approaches that leverage the knowledge gained by the solution of former problems in order to speed up the training process, or to enhance the performance of new solutions ...
Context 7
... approaches aim to discover features that are domain independent [14], [95]. The domain independent features can be extracted from several different domains (D 1 , D 2 , . . . , ). ...
Context 8
... }, respectively. x 1i and x 2j are common features, if they are identical and their marginal probability distribution in the two domains (D 1 , D 2 ) are the same. If we take a dataset of images of objects to be grasped as an example, identical features can be the red color intensity value of the same pixels of the images of different domains (if the images have the same resolution). ...
Context 9
... heavily influences the adaptability of the solution. We can highlight the advantages and disadvantages of different approaches from this aspect. We especially focus on the inhierarchy scope of some approaches, along with their training strategies, and argue that larger DL models that incorporate multiple levels of the hierarchy presented in Fig. 1, such as end-to-end methods, are not always preferable over modular DL-based solutions. The use of transfer learning methods and the leveraging of the modular nature of DL models can help in the quick and data efficient preparation of new solutions. We support our argument with a set of relevant examples and best practices and ...
Context 10
... examples are categorized according to the approaches for achieving such efficient adaptation capabilities, meanwhile the order of the presented examples roughly represents moving higher in the hierarchy in Fig. ...
Context 11
... Feature Extraction for Planning: The examples presented so-far can be considered as solutions for the perception problem. In case of grasp planning, however, the predictions may also depend on the kinematic model of the end effector, resulting in different strategies for different kinds of end effectors. Thus, according to our categorization in Fig. 1, such challenges are at a higher level in the hierarchy (motion planning). The following examples, that incorporate this knowledge inside the DL model provide solutions for planning ...
Context 12
... Specify the requirements that the application should fulfil and check if it could benefit from a DL-based solution (this could be done according to Section II). 2) Identify the specific task that the DL-model should be responsible for and locate it in the hierarchy of Fig. ...
Context 13
... III or elsewhere. 4) Decide on the model structure (monolithic or modular) based-on the scope of the problem (Fig. 1) and the available resources. 5) Carry out a training strategy for the model. If possible use pretrained models, feature extractors and/or transfer learning, and the incremental end-to-end training approach. If the available training dataset is not large enough for the end-to-end fine-tuning, then use the section-wise training strategy. ...

Similar publications

Article
Full-text available
This paper presents a first study on solution representation learning for inducing greater alignment and hence positive transfers between distinct multi-objective optimization tasks that bear discrepancies in their original search spaces. We first establish a novel probabilistic model-based multiobjective transfer evolutionary optimization (TrEO) f...
Preprint
Full-text available
Despite the promising results achieved, state-of-the-art interactive reinforcement learning schemes rely on passively receiving supervision signals from advisor experts, in the form of either continuous monitoring or predefined rules, which inevitably result in a cumbersome and expensive learning process. In this article, we introduce a novel init...
Article
Full-text available
Deep Neural Networks (DNNs) trained on one dataset (source domain) do not perform well on another set of data (target domain), which is different but has similar properties as the source domain. Domain Adaptation (DA) strives to alleviate this problem and has great potential in its application in practical settings, real-world scenarios, industrial...
Article
Full-text available
Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage a...
Article
Full-text available
This paper addresses an incremental learning problem, in which tasks are learned sequentially without access to the previously trained dataset. Catastrophic forgetting is a significant bottleneck to incremental learning as the network performs poorly on previous tasks when it is trained on a new task. We propose an adaptive model search method that...

Citations

... This paper evaluates the effectiveness of the HiDeS NODE in the realms of multi-robot systems and opinion dynamics, key areas of dynamic systems, both domains that inherently involve complex interactions and communication (Granha et al., 2022). In multi-robot systems, conventional analytic solutions fall short in high-dimensional control tasks (Károly et al., 2021), such as multi-robot grasping and motion control. NODEs, in contrast, offer a promising avenue for modeling and controlling complex dynamic interactions in a continuous, efficient, and adaptable manner in multi-robot systems. ...
Article
Full-text available
This paper addresses the limitations of current neural ordinary differential equations (NODEs) in modeling and predicting complex dynamics by introducing a novel framework called higher-order-derivative-supervised (HiDeS) NODE. This method extends traditional NODE frameworks by incorporating higher-order derivatives and their interactions into the modeling process, thereby enabling the capture of intricate system behaviors. In addition, the HiDeS NODE employs both the state vector and its higher-order derivatives as supervised signals, which is different from conventional NODEs that utilize only the state vector as a supervised signal. This approach is designed to enhance the predicting capability of NODEs. Through extensive experiments in the complex fields of multi-robot systems and opinion dynamics, the HiDeS NODE demonstrates improved modeling and predicting capabilities over existing models. This research not only proposes an expressive and predictive framework for dynamic systems but also marks the first application of NODEs to the fields of multi-robot systems and opinion dynamics, suggesting broad potential for future interdisciplinary work. The code is available at https://github.com/MengLi-Thea/HiDeS-A-Higher-Order-Derivative-Supervised-Neural-Ordinary-Differential-Equation.
... -Advanced and non-linear robot control [15,16]; -Applied AI and ML [17,18]; -Robot navigation and SLAM [19]; -Laboratoy automation [20]; -AR/VR/XR technologies [21,22] -Agri-food robotics [23]; -Robotic meat processing (H2020 No 871631) [24][25][26]; -Safety of autonomous vehicles [27]; -Urban air mobility [28]; -Surgical Data Science [29]; -Robot-Assisted Minimally Invasive Surgery improvements (Da Vinci Research Kit) [30][31]; -Image-Guided Surgery (ÓU Consolidator Researcher grant) [32,33] -Technical and non-technical surgical skill assessment [34][35][36]; -Ultrasound-guided robotics [37]. ...
... Recent technical innovations in deep learning have led to a quantum leap in robot technology and autonomous driving technology [1]. In particular, various sensors, such as cameras, lidar, radar, GPS, ultrasonic waves, and IMUs, are used to acquire and process diverse information related to vehicle situational awareness in order to make driving judgments and control the vehicle [1][2][3]. ...
Article
Full-text available
For autonomous driving, it is imperative to perform various high-computation image recognition tasks with high accuracy, utilizing diverse sensors to perceive the surrounding environment. Specifically, cameras are used to perform lane detection, object detection, and segmentation, and, in the absence of lidar, tasks extend to inferring 3D information through depth estimation, 3D object detection, 3D reconstruction, and SLAM. However, accurately processing all these image recognition operations in real-time for autonomous driving under constrained hardware conditions is practically unfeasible. In this study, considering the characteristics of image recognition tasks performed by these sensors and the given hardware conditions, we investigated MTL (multi-task learning), which enables parallel execution of various image recognition tasks to maximize their processing speed, accuracy, and memory efficiency. Particularly, this study analyzes the combinations of image recognition tasks for autonomous driving and proposes the MDO (multi-task decision and optimization) algorithm, consisting of three steps, as a means for optimization. In the initial step, a MTS (multi-task set) is selected to minimize overall latency while meeting minimum accuracy requirements. Subsequently, additional training of the shared backbone and individual subnets is conducted to enhance accuracy with the predefined MTS. Finally, both the shared backbone and each subnet undergo compression while maintaining the already secured accuracy and latency performance. The experimental results indicate that integrated accuracy performance is critically important in the configuration and optimization of MTL, and this integrated accuracy is determined by the ITC (inter-task correlation). The MDO algorithm was designed to consider these characteristics and construct multi-task sets with tasks that exhibit high ITC. Furthermore, the implementation of the proposed MDO algorithm, coupled with additional SSL (semi-supervised learning) based training, resulted in a significant performance enhancement. This advancement manifested as approximately a 12% increase in object detection mAP performance, a 15% improvement in lane detection accuracy, and a 27% reduction in latency, surpassing the results of previous three-task learning techniques like YOLOP and HybridNet.
... Deep convolutional networks have become the preferred choice for a wide range of applications, ranging from robotics (Bai et al., 2020;Ruiz-del-Solar and Loncomilla, 2020;Károly et al., 2021;Yu et al., 2023) to autonomous driving (Bojarski et al., 2017;Kuutti et al., 2019;Plebe et al., 2019aPlebe et al., ,b, 2024Plebe and Da Lio, 2020). Convolutional networks overcome a long-standing challenge in computer vision: recognizing objects despite variations in their appearance, such as changes in illumination, size, and viewpoint. ...
Article
Full-text available
This paper proposes a neural network model that estimates the rotation angle of unknown objects from RGB images using an approach inspired by biological neural circuits. The proposed model embeds the understanding of rotational transformations into its architecture, in a way inspired by how rotation is represented in the ellipsoid body of Drosophila . To effectively capture the cyclic nature of rotation, the network's latent space is structured in a circular manner. The rotation operator acts as a shift in the circular latent space's units, establishing a direct correspondence between shifts in the latent space and angular rotations of the object in the world space. Our model accurately estimates the difference in rotation between two views of an object, even for categories of objects that it has never seen before. In addition, our model outperforms three state-of-the-art convolutional networks commonly used as the backbone for vision-based models in robotics.
... This development has sparked new thinking in natural language processing and dialogue systems. At the same time, the rapid advancement of robotics technology [66,32] has created a demand for more intelligent and natural human-machine interaction. Combining LLMs with robots can provide robots with stronger natural language understanding and generation capabilities, enabling more intelligent and human-like conversations and interactions. ...
Preprint
Full-text available
The human ability to learn, generalize, and control complex manipulation tasks through multi-modality feedback suggests a unique capability, which we refer to as dexterity intelligence. Understanding and assessing this intelligence is a complex task. Amidst the swift progress and extensive proliferation of large language models (LLMs), their applications in the field of robotics have garnered increasing attention. LLMs possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots. Researchers and engineers in the field of robotics have recognized the immense potential of LLMs in enhancing robot intelligence, human-robot interaction, and autonomy. Therefore, this comprehensive review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and path planning. We first provide an overview of the background and development of LLMs for robotics, followed by a description of the benefits of LLMs for robotics and recent advancements in robotics models based on LLMs. We then delve into the various techniques used in the model, including those employed in perception, decision-making, control, and interaction. Finally, we explore the applications of LLMs in robotics and some potential challenges they may face in the near future. Embodied intelligence is the future of intelligent science, and LLMs-based robotics is one of the promising but challenging paths to achieve this.
... Industrial sectors are benefiting from the adoption of timeseries analysis to improve the efficiency of their operations [9][10][11]. Although several statistical and Machine Learning (ML) techniques have been applied to time-series forecasting, such as Autoregressive Integrated Moving Average (ARIMA) models or linear regression, in recent years there has been a growing interest in the application of Deep Learning (DL) models to perform this task because of their ability to automatically learn complex patterns in data [12,13]. In particular, DL architectures have shown to be successful in forecasting time-series data with long-term dependencies [14,15], which is highly relevant for many Industry 4.0 applications such as predictive maintenance and fault detection. ...
Article
Full-text available
A novel AGV (Automated Guided Vehicle) control architecture has recently been proposed where the AGV is controlled remotely by a virtual Programmable Logic Controller (PLC), which is deployed on a Multi-access Edge Computing (MEC) platform and connected to the AGV via a radio link in a 5G network. In this scenario, we leverage advanced deep learning techniques based on ensembles of N-BEATS (state-of-the-art in time-series forecasting) to build predictive models that can anticipate the deviation of the AGV’s trajectory even when network perturbations appear. Therefore, corrective maneuvers, such as stopping the AGV, can be performed in advance to avoid potentially harmful situations. The main contribution of this work is an innovative application of the N-BEATS architecture for AGV deviation prediction using sequence-to-sequence modeling. This novel approach allows for a flexible adaptation of the forecast horizon to the AGV operator’s current needs, without the need for model retraining or sacrificing performance. As a second contribution, we extend the N-BEATS architecture to incorporate relevant information from exogenous variables alongside endogenous variables. This joint consideration enables more accurate predictions and enhances the model’s overall performance. The proposed solution was thoroughly evaluated through realistic scenarios in a real factory environment with 5G connectivity and compared against main representatives of deep learning architectures (LSTM), machine learning techniques (Random Forest), and statistical methods (ARIMA) for time-series forecasting. We demonstrate that the deviation of AGVs can be effectively detected by using ensembles of our extended N-BEATS architecture that clearly outperform the other methods. Finally, a careful analysis of a real-time deployment of our solution was conducted, including retraining scenarios that could be triggered by the appearance of data drift problems.
... The second issue involves domain adaptation problems when trying to recreate simulated emergent behaviors physically, which leads to significant performance reductions [15,36,37]. Finally, the increasing complexity of robotic control algorithms [19] present a significant obstacle to simulating multiple physical models required for swarming robotics. Although robotics research has often focused on developing individual capabilities [1], simulating swarming robotics requires the simulation of multiple physical models. ...
Preprint
Full-text available
Agent-based modeling (ABM) and simulation have emerged as important tools for studying emergent behaviors, especially in the context of swarming algorithms for robotic systems. Despite significant research in this area, there is a lack of standardized simulation environments, which hinders the development and deployment of real-world robotic swarms. To address this issue, we present Zespol, a modular, Python-based simulation environment that enables the development and testing of multi-agent control algorithms. Zespol provides a flexible and extensible sandbox for initial research, with the potential for scaling to real-world applications. We provide a topological overview of the system and detailed descriptions of its plug-and-play elements. We demonstrate the fidelity of Zespol in simulated and real-word robotics by replicating existing works highlighting the simulation to real gap with the milling behavior. We plan to leverage Zespol's plug-and-play feature for neuromorphic computing in swarming scenarios, which involves using the modules in Zespol to simulate the behavior of neurons and their connections as synapses. This will enable optimizing and studying the emergent behavior of swarm systems in complex environments. Our goal is to gain a better understanding of the interplay between environmental factors and neural-like computations in swarming systems.
... An algorithm based on the neural network technique called ratSLAM is proposed in [83]. The DL algorithm is a part of AI that has multiple levels of representation with nonlinear modules [84]. Researchers have applied DL for SLAM in three aspects, namely semantic mapping [85], loop clo-sure detection [86] and inter-frame estimation [87]. ...
... D EEP learning has made major impacts on robotics [1]. ...
Article
Full-text available
Deep learning has revolutionized the field of robotics. To deal with the lack of annotated training samples for learning deep models in robotics, Sim-to-Real transfer has been invented and widely used. However, such deep models trained in simulation environment typically do not transfer very well to the real world due to the challenging problem of “reality gap”. In response, this paper presents a conceptually new Digital Twin (DT)-CycleGAN framework by integrating the advantages of both DT methodology and the CycleGAN model so that the reality gap can be effectively bridged. Our core innovation is that real and virtual DT robots are forced to mimic each other in a way that the gaps or differences between simulated and realistic robotic behaviors are minimized. To effectively realize this innovation, visual grasping is employed as an exemplar robotic task, and the reality gap in zero-shot Sim-to-Real transfer of visual grasping models is defined as grasping action consistency losses and intrinsically penalized during the DT-CycleGAN training process in realistic simulation environments. Specifically, first, cycle consistency losses between real visual images and simulation images are defined and minimized to reduce the reality gaps in visual appearances during visual grasping tasks. Second, the grasping agent's action consistency losses are defined and penalized to minimize the inconsistency of the grasping agent's actions between the virtual states generated by the DT-CycleGAN generator and the real visual states. By jointly penalizing both the cycle consistency losses and the grasping agent's action consistency losses in DT-CycleGAN, the reality gaps in both visual appearance and grasping action states in simulated and real environments are minimized, thus significantly contributing to the effective and robust zero-shot Sim-to-Real transfer of the trained visual grasping models. Extensive experiments demonstrated the effectiveness and efficiency of our novel DT-CycleGAN framework for zero-shot Sim-to-Real transfer. Our code and models are released on GitHub to facilitate other researchers' works in this promising direction $^{4}$ .
... This can be particularly challenging in robotics, where data can be difficult to obtain and may be subject to noise and uncertainty. In addition, robotics applications often require real-time processing, which can be computationally expensive and may require specialized hardware [56]. Furthermore, in order to analyse massive volumes of data, build models, and make predictions in real-time, AI/ML/DL systems need a lot of processing power. ...
Article
Full-text available
Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) have revolutionized the field of advanced robotics in recent years. AI, ML, and DL are transforming the field of advanced robotics, making robots more intelligent, efficient, and adaptable to complex tasks and environments. Some of the applications of AI, ML, and DL in advanced robotics include autonomous navigation, object recognition and manipulation, natural language processing, and predictive maintenance. These technologies are also being used in the development of collaborative robots (cobots) that can work alongside humans and adapt to changing environments and tasks. The AI, ML, and DL can be used in advanced transportation systems in order to provide safety, efficiency, and convenience to the passengers and transportation companies. Also, the AI, ML, and DL are playing a critical role in the advancement of manufacturing assembly robots, enabling them to work more efficiently, safely, and intelligently. Furthermore, they have a wide range of applications in aviation management, helping airlines to improve efficiency, reduce costs, and improve customer satisfaction. Moreover, the AI, ML, and DL can help taxi companies in order to provide better, more efficient, and safer services to customers. The research presents an overview of current developments in AI, ML, and DL in advanced robotics systems and discusses various applications of the systems in robot modification. Further research works regarding the applications of AI, ML, and DL in advanced robotics systems are also suggested in order to fill the gaps between the existing studies and published papers. By reviewing the applications of AI, ML, and DL in advanced robotics systems, it is possible to investigate and modify the performances of advanced robots in various applications in order to enhance productivity in advanced robotic industries.