a) The field setup for the RoboCup Standard Platform League (SPL), and b) a snapshot from an SPL game showing the Nao robots playing soccer.

a) The field setup for the RoboCup Standard Platform League (SPL), and b) a snapshot from an SPL game showing the Nao robots playing soccer.

Source publication
Article
Full-text available
A robot can perform a given task through a policy that maps its sensed state to appropriate actions. We assume that a hand-coded controller can achieve such a mapping only for the basic cases of the task. Refining the controller becomes harder and gets more tedious and error prone as the complexity of the task increases. In this paper, we present a...

Similar publications

Article
Full-text available
Robots are becoming ever more autonomous. This expanding ability to take unsupervised decisions renders it imperative that mechanisms are in place to guarantee the safety of behaviours executed by the robot. Moreover, smart autonomous robots should be more than safe; they should also be explicitly ethical -- able to both choose and justify actions...

Citations

... Indeed, many learning paradigms include human-in-the-loop approaches, and we believe these should taken into account. These include active learning [46], learning by demonstration [155], and corrective human feedback learning [132], used within the context of interactions in applications involving human teachers such as learning-by-teaching educational scenarios [94] or general collaborative scenarios [34]. As a result, we extend the definition from Beer et al. [22] to make it applicable to social robots, and define autonomy of a social robot as fol-lows: ...
... Those dimensions are generally separated in the design and implementation of most robots, hence as a result, intelligence and autonomy on each dimension may be completely different. For example, some semi-autonomous robots include completely human-controlled perception [183], or rely on human input for learning [46,155,132] or verifying the suitability of robot plans [61]. ...
Chapter
Social robots are becoming increasingly diverse in their design, behavior, and usage. In this chapter, we provide a broad-ranging overview of the main characteristics that arise when one considers social robots and their interactions with humans. We specifically contribute a framework for characterizing social robots along seven dimensions that we found to be most relevant to their design. These dimensions are: appearance, social capabilities, purpose and application area, relational role, autonomy and intelligence, proximity, and temporal profile. Within each dimension, we account for the variety of social robots through a combination of classifications and/or explanations. Our framework builds on and goes beyond existing frameworks, such as classifications and taxonomies found in the literature. More specifically, it contributes to the unification, clarification, and extension of key concepts, drawing from a rich body of relevant literature. This chapter is meant to serve as a resource for researchers, designers, and developers within and outside the field of social robotics. It is intended to provide them with tools to better understand and position existing social robots, as well as to inform their future design.
... The concept of autonomy should also account for learning (Baraka, Alves-Oliveira, and Ribeiro, 2020. Indeed, many learning paradigms include human-in-the-loop approaches, such as active learning (Chao, Cakmak, & Thomaz, 2010) learning by demonstration (Rybski, Yoon, Stolarz, & Veloso, 2007), and corrective human feedback learning (Meriçli, Veloso, & Akın, 2011), used within the context of interactions in applications involving human teachers such as learning-by-teaching educational scenarios (Jacq, Lemaignan, Garcia, Dillenbourg, & Paiva, 2016) or general collaborative scenarios (Breazeal, Hoffman, & Lockerd, 2004). ...
Thesis
Full-text available
Creativity is an ability with psychological and developmental benefits. Creative levels are dynamic and oscillate throughout life, with a first major decline occurring at the age of 7 years old. However, creativity is an ability that can be nurtured if trained, with evidence suggesting an increase in this ability with the use of validated creativity training. Yet, creativity training for young children (aged between 6-9 years old) appears scarce. Additionally, existing training interventions resemble test-like formats and lack playful dynamics that could engage children in creative practices over time. This PhD project aimed at contributing to creativity stimulation in children by proposing to use social robots as intervention tools, thus adding playful and interactive dynamics to the training. Towards this goal, we conducted three studies in schools, summer camps, and museums for children, that contributed to the design, fabrication, and experimental testing of a robot whose purpose was to re-balance creative levels. Study 1 (n = 140) aimed at testing the effect of existing activities with robots in creativity and provided initial evidence of the positive potential of robots for creativity training. Study 2 (n = 134) aimed at including children as co-designers of the robot, ensuring the robot’s design meets children’s needs and requirements. Study 3 (n = 130) investigated the effectiveness of this robot as a tool for creativity training, showing the potential of robots as creativity intervention tools. In sum, this PhD showed that robots can have a positive effect on boosting the creativity of children. This places social robots as promising tools for psychological interventions.
... The feedback given by a human teacher can be corrective in the actions domain. The agent can be interrupted by the user while executing a policy, in order to perform improvements (Meriçli, Veloso, and Akin 2011). The user provides demonstrations for the current state, and this new data is attached to the policy, in order to be executed in similar states. ...
Article
Full-text available
Robot Learning problems are limited by physical constraints, that make learning successful policies for complex motor skills on real systems unfeasible. Some Reinforcement Learning methods like Policy Search offer stable convergence toward locally optimal solutions. Whereas Interactive Machine Learning or Learning from Demonstration methods allow fast transfer of human knowledge to the agents. However, most methods require expert demonstrations. In this work, we propose the use of human corrective advice in the actions domain for learning motor trajectories. Additionally, we combine this human feedback with reward functions in a Policy Search learning scheme. The use of both sources of information speeds up the learning process, since the intuitive knowledge of the human teacher can be easily transferred to the agent, while the Policy Search with the cost/reward function take over for supervising the process and reducing the influence of occasional wrong human corrections. This interactive approach has been validated for learning movement primitives with simulated arms with several DoFs in reaching via-points movements, and also using real robots in tasks like ``writing characters'' and the game ball-in-a-cup. Compared to a standard Reinforcement Learning without human advice, the results show that the proposed method not only converges to higher rewards when learning movement primitives, but also the learning is sped up by a factors of 4 to 40 times depending on the task.
... Indeed, many learning paradigms include human-in-the-loop approaches, and we believe these should taken into account. These include active learning [46], learning by demonstration [155], and corrective human feedback learning [132], used within the context of interactions in applications involving human teachers such as learning-by-teaching educational scenarios [94] or general collaborative scenarios [34]. As a result, we extend the definition from Beer et al. [22] to make it applicable to social robots, and define autonomy of a social robot as fol-lows: ...
... Those dimensions are generally separated in the design and implementation of most robots, hence as a result, intelligence and autonomy on each dimension may be completely different. For example, some semi-autonomous robots include completely human-controlled perception [183], or rely on human input for learning [46,155,132] or verifying the suitability of robot plans [61]. ...
Preprint
Social robots are becoming increasingly diverse in their design, behavior, and usage. In this chapter, we provide a broad-ranging overview of the main characteristics that arise when one considers social robots and their interactions with humans. We specifically contribute a framework for characterizing social robots along 7 dimensions that we found to be most relevant to their design. These dimensions are: appearance, social capabilities, purpose and application area, relational role, autonomy and intelligence, proximity, and temporal profile. Within each dimension, we account for the variety of social robots through a combination of classifications and/or explanations. Our framework builds on and goes beyond existing frameworks, such as classifications and taxonomies found in the literature. More specifically, it contributes to the unification, clarification, and extension of key concepts, drawing from a rich body of relevant literature. This chapter is meant to serve as a resource for researchers, designers, and developers within and outside the field of social robotics. It is intended to provide them with tools to better understand and position existing social robots, as well as to inform their future design. Preprint available at https://arxiv.org/abs/1907.09873
... There are algorithms of LfD which are focused on learning from corrective demonstrations, which, instead of deriving a policy from the demonstrations, keep hand-coded algorithms as the primary source of the action policy, and use the demonstration data only to make exceptions as needed [28][29][30][31]. ...
Article
Full-text available
The main goal of this article is to present COACH (COrrective Advice Communicated by Humans), a new learning framework that allows non-expert humans to advise an agent while it interacts with the environment in continuous action problems. The human feedback is given in the action domain as binary corrective signals (increase/decrease the current action magnitude), and COACH is able to adjust the amount of correction that a given action receives adaptively, taking state-dependent past feedback into consideration. COACH also manages the credit assignment problem that normally arises when actions in continuous time receive delayed corrections. The proposed framework is characterized and validated extensively using four well-known learning problems. The experimental analysis includes comparisons with other interactive learning frameworks, with classical reinforcement learning approaches, and with human teleoperators trying to solve the same learning problems by themselves. In all the reported experiments COACH outperforms the other methods in terms of learning speed and final performance. It is of interest to add that COACH has been applied successfully for addressing a complex real-world learning problem: the dribbling of the ball by humanoid soccer players. © 2018 Springer Science+Business Media B.V., part of Springer Nature
... To be able to construct a learning problem and apply it in real-world, we need to have robot perception algorithm that is able to calculate the pose of the robot and the position of the ball. Most of the behavior learning approaches in the literature uses this level of abstraction in learning [3,10,11,13]. However, this level of abstraction requires solving complex perception problems such as determining its own pose using the features extracted from the camera image. ...
... Therefore, they use reinforcement learning to optimize for two conflicting goals; being as fast as possible, but also keeping the ball as close as possible. Mericli et al. propose to use a corrective human feedback system to teach the robot to dribble the ball through stationary defender robots [13]. The main contribution of the work is the combination of hand-coded behavior with the active demonstration of the human. ...
Preprint
Full-text available
In imitation learning, behavior learning is generally done using the features extracted from the demonstration data. Recent deep learning algorithms enable the development of machine learning methods that can get high dimensional data as an input. In this work, we use imitation learning to teach the robot to dribble the ball to the goal. We use B-Human robot software to collect demonstration data and a deep convolutional network to represent the policies. We use top and bottom camera images of the robot as input and speed commands as outputs. The CNN policy learns the mapping between the series of images and speed commands. In 3D realistic robotics simulator experiments, we show that the robot is able to learn to search the ball and dribble the ball, but it struggles to align to the goal. The best-proposed policy model learns to score 4 goals out of 20 test episodes.
... Chipalkatty [8] shows how to incorporate human factors into a Model Predictive Control framework, in which human commands are predicted ahead of time. Utilizing machine learning, researchers have also looked into transferring human skills to robots [15] and incorporating human feedback to robot learning [12]. Several human-machine interfaces have been studied. ...
Chapter
Full-text available
In this chapter, we consider a robotic field exploration and classification task where the field robots have a limited communication with a remote human operator, and also have constrained motion energy budgets. We then extend our previously proposed paradigm for human–robot collaboration (Cai and Mostofi, Proceedings of the American control conference, pp 440–446, 2015 [4]), (Cai and Mostofi, Proceedings of Robotics: Science and Systems, 2016 [5]) to the case of multiple robots. In this paradigm, the robots predict human visual performance , which is not necessarily perfect, and optimize seeking help from humans accordingly (Cai and Mostofi, Proceedings of the American control conference, pp 440–446, 2015 [4]), (Cai and Mostofi, Proceedings of Robotics: Science and Systems, 2016 [5]. More specifically, given a probabilistic model of human visual performance from (Cai and Mostofi, Proceedings of the American control conference, pp 440–446, 2015 [4]), in this chapter we show how multiple robots can properly optimize motion, sensing, and seeking help. We mathematically and numerically analyze the properties of robots’ optimum decisions, in terms of when to ask humans for help, when to rely on their own judgment and when to gather more information from the field. Our theoretical results shed light on the properties of the optimum solution. Moreover, simulation results demonstrate the efficacy of our proposed approach and confirm that it can save resources considerably.
... Studies have been conducted on incorporating human inputs into control schemes, such as model predictive control systems and vehicle routing algorithms [2], [3]. Utilizing machine learning, researchers have looked into how robots can master certain skills with humans' assistance [4], [5]. Branson et al. proposes a humancomputer interface that resembles the 20-question game for collaborative object classification [6] and Srivastava et al. proposes a Decision Support interface that facilitates human operators' decision making [7]. ...
Conference Paper
Full-text available
In this paper, we consider a collaborative human-robot Traveling Salesman Problem (TSP), where a robot is tasked with site inspection and target classification, under a limited motion energy budget and with a limited access to a human operator. More specifically, a robotic field operation is considered where a robot has to co-optimize seeking human assistance (via asking questions) and selective TSP tour design (for a closer inspection) based on an initial remote sensing. The robot has a limited budget for both communication with the human operator and site inspection motion consumption. By utilizing our past work on the target classification performance of humans and robots, we show how the collaborative human-robot TSP can be solved under limited resources. We further theoretically characterize the average correct classification probability as a function of the given number of questions to the human operator and the given motion energy budget. Extensive simulation results confirm our theoretical derivations.
... For instance, [10] shows how robots can recover from difficult states or failures by asking for help. In [25,35,36], a robot learns from human demonstration and correction, while a robot performs object detection and recognition with human inputs in [23,28,32,39]. In computer vision, a number of work have focused on designing human-machine interfaces that allow the vision algorithm to ask for human's help when it encounters difficulties [4,9,30,40]. ...
... Ball-dribbling is a complex behavior where a robot play- er attempts to maneuver the ball in a very controlled way while moving towards a desired target. Very few works have addressed ball dribbling behavior with humanoid biped robots [5][6][7][8][9]. Furthermore, few details are mentioned in these works concerning specific dribbling modeling [10,11], performance evaluations for ball-control, or obtained accuracy to the desired path. ...
Conference Paper
Hierarchical task decomposition strategies allow robots and agents in general to address complex decision-making tasks. Layered learning is a hierarchical machine learning paradigm where a complex behavior is learned from a series of incrementally trained sub-tasks. This paper describes how layered learning can be applied to design individual behaviors in the context of soccer robotics. Three different layered learning strategies are implemented and analyzed using a ball-dribbling behavior as a case study. Performance indices for evaluating dribbling speed and ball-control are defined and measured. Experimental results validate the usefulness of the implemented layered learning strategies showing a trade-off between performance and learning speed.