Figure 1 - uploaded by Jacky Baltes
Content may be subject to copyright.
The modified Robotis Bioloid robot S TORM 

The modified Robotis Bioloid robot S TORM 

Source publication
Conference Paper
Full-text available
Until recent years, the development of real-world humanoid robotics applications has been hampered by a lack of available mobile computational power. Unlike wheeled platforms, which can reasonably easily be expected to carry a payload of computers and batteries, humanoid robots couple a need for complex control over many degrees of freedom with a f...

Contexts in source publication

Context 1
... light of these experiences, we suggest the use of the IrDA protocol with the physical layer only as the most flex- ible solution. We implemented our own minimal IrCOMM layer to establish a connection between the phone and our hardware. Unfortunately, this requires the use of a microcontroller (e.g., PIC or AtMega AVR). However, many designs for complex robots such as humanoids or snake-like robots already require a microcontroller for motion planning and processing of sensory information. Having described some of the hurdles that have to be navi- gated in terms of employing mobile phones as robot control hardware, we now discuss the actual use of these platforms for sophisticated, real-time artificial intelligence. There are many significant and interacting AI problems in the domain of mobile robotics, which is why this domain is ubiquitous from the standpoint of both research and teaching. As a basis for dealing with a complex world, a robot must first be able to move about in it coherently, and combine reactive and deliberative reasoning. Our agent architectures are reactive and behaviour-based, and use behaviour trees to support a balance between deliberative planning and reac- tion. A behaviour tree involves successive levels establish- ing a context for lower-level behaviours, which are implemented as finite state machines. These are described using our own XML-based meta-language, in order to provide a machine-processable description of the intent of a behaviour. The specification of behaviours includes precondi- tions (enter functions) and postconditions (exit functions). A slightly simplified example of a simple behaviour that scans for a target with increasing sweeps is shown in Table 1. The XML schemas include additional markup to refer to states by name ( %%State("Random Walk") access variables ( %%v ) and to trigger transitions to other states ( %%Transition ). Behaviours are organized into behaviour trees. Higher level behaviours can override or enable other lower level behaviours. For example, a Perception behaviour may disable the scan for target behaviour and enable the state Target In Front if it recognizes the target. One of the design goals of the meta language was to be highly efficient. Instead of adding a XML parser and in- terpreter to the agent, the meta language is parsed and in- terpreted offline and converted into highly efficient C code. This code is then compiled and executed on the mobile phone. For example, the example above shows that the pro- grammer uses state names (e.g., “Random Walk,” and “Scan For Target”). However, the states’ names are converted to integers in the C code. Because of this formalized state rep- resentation, we can also easily generate alternative represen- tations when they are useful, such as visualizing the finite state machine as a graph. For example, figure 2 shows the state transition graph for a simple approach task. The robot first approaches a target and then walks away from it. The actual behavior code on the mobile phone must ultimately prescribe movements for the robot. These movements are defined atomically, and are developed beforehand as motor control programs using a software interface (figure 3. The interface allows one to move the robot into a specific position and save this position. The interface also allows one to set the trim (i.e., offset) for all joints as well as the home position. A separate window tab is used to combine these positions into motions. Each motion has a cycle time associated with it and each part of a motion has a arrival time associated with it. Thus, the interface allows the user to easily adjust the speed of a whole motion or individual parts of the motion. The trajectory of all joints is shown in the bottom window. S TORM (figure 1) has twenty atomic motions, including: start walking , take step with right foot , take step with left foot , stop from left walk , stop from right walk , sideways step left , sideways step right , and kick with right foot . These movements are then available to be played back as required by any of our programs running on the mobile phone - here, by the particular state out of a behaviour currently in control of the robot. In order to respond adaptively within a given state, and appropriately make transitions between states, a robot must be able to perceive its current environment (real-time vision), and know its current location within it (localization and mapping). The remainder of the paper explores the design and implementation of these two facilities, using mobile phones, as examples of adapting so- phisticated artificial intelligence to these embedded devices. S TORM uses the Nokia’s camera as its main sensor. The camera is used to properly approach objects in the field of view of the robot as well as to supply information for localization and mapping. To be robust enough to deal with a complex environment such as robotic soccer, the vision processing makes little use of colours, and makes use of a very fast approximate region segmentation algorithm. First, the algorithm scans the image and extracts scan line segments (i.e., segments of similar colour) of approximately the right size. This step is similar to standard region segmentation algorithms. However, we noticed that implementing a full union-find algorithm was too slow for a mobile phone, since it took about 2 seconds per image. The adaptations needed here are typical of adapting sophisticated computation to mobile phones: since most objects of interest in the soccer, environment are relatively small, we use a flood fill pixel merge algorithm, to find the associated region for a scanline. The flood fill algorithm keeps track of which pixels have previously been visited, and thus will visit each pixel at most once. The returned region is then checked for size (i.e., number of connected pixels), size of the bounding box, aspect ratio, and compactness. Only in the final step does the algorithm test whether the average colour of the region matches the object colour. If any of these tests fail, the object is re- jected. Using only average colours of regions results in robust recognition of the ball and the goals and takes on average approximately 200ms. An approximation of the relative position of objects is possible by determining the pan and tilt angles of the phone (from the servo on which it is mounted), and then calculat- ing the distance to the centre of the image. In this domain, it is safe to assume that these objects are on the ground plane. The relative position of an object at the centre of the image will have the closest approximation, so the camera is cen- tered on important objects such as the ball before a decision is made as to what action to take next. Goals are also detected as objects. Each goal is a distinct colour (prescribed by RoboCup rules). If both goal colours are found in one image, the regions of each goal colour are merged with other regions of the same goal colour. The goal colour that is present in the largest merged region is consid- ered to be the goal currently being viewed. To help the feature-based localization method described in the next section, we use a complex camera calibration based on the Tsai camera calibration algorithm (Tsai 1986). This calibration is only done once for each robot. Given this calibration information, we are able to map points in the image accurately to their real world coordinates. This is essential, because it allows us to determine the distance and orientation of the ball to a feature point (ball, goal post, line) Before localization can occur, features must be extracted from the image. The relevant features for localization on the soccer field are lines, goals, and the centre circle. Every 5th column in the image, the system scans from the bottom of the image towards the top. If there is a transition from a green pixel to a white pixel, the pixel p is remembered in a list. The scan continues upward, so there may be more than one transition pixel in a column. Lines are then found by running a gradient guided Hough transform (Hough 1962). For each point p i , a set of adjacent points is determined. Triplets are formed from these by including one point to the left of the point p i , and one point to the right of p i . There are several triplets that can be formed this way out of the neighborhood of adjacent points. Each triplet votes for an unbounded line in the image. This vote is fuzzified by voting for a small range of slopes through the point p i . The peaks in the Hough accumulator space determine the equations of possible lines. For each peak in the accumulator space, we search along the pixels determined by the line equation to find start and end points of the lines. ...
Context 2
... in the average AI lab would be platforms such as the Pioneer-II, which are large enough to carry laptops or full-size internal computing systems, but remain similarly expensive and carry significant demands due of their size (heavy lead-acid batteries and larger motors). Conversely, recent years have brought about a revolu- tion in available computational ability in embedded systems from the standpoint of mobile robotics. Smaller, powerful and less power-hungry processors, cheaper flash mem- ory, and better battery technology have combined to allow far more effective embedded systems than were previously possible. Consequently, a generation of systems that are lighter and more robust now affords the possibility of smaller, lighter, and more adaptable robots. For the same reasons, these small, powerful embedded systems have also moved out of the industrial sector and into the realm of consumer electronics, giving much higher computational ability in embedded devices for everything from video equipment to automobiles. In particular, mobile phones have evolved from basic telephone and contact management abilities to handheld computers supporting sophisticated applications. The latter provide particularly exciting possibilities for AI and robotics: they combine powerful computational abilities with on-board peripheral devices (cameras, accelerometers, GPS units, bluetooth networking) that are in many cases improvements over what was available just a few years ago and would have to be separately managed. Our work involves the development of control, planning, learning, and vision in humanoid robots. While small embedded processors have previously been used to power small humanoid robots (e.g. (Yamasaki et al. 2001), Manus I (Zhang et al. 2003), Tao-Pie-Pie (Baltes and Lam 2004), Roboerectus (Zhou and Yue 2004), and Hansa Ram (Kim et al. 2004)), these examples range in cost from $1000 to $20,000 US. Currently, we have moved from using these older types of embedded systems to developing sophisticated robotics platforms using mobile phones. Used modern mobile phones can be had for $100-$200 US (or indeed, even for free as a result of recycling programs) and provide all the facilities necessary to power complex adaptive humanoid robots for a fraction of the cost of several years ago. Our interest in humanoid robots is in developing the kinds of broad adaptive behaviour that are necessary to support service robots of the future (e.g. for nursing or firefighting). These behaviours include being able to actively balance on uneven surfaces (e.g. move through grass or gravel), plan complex motions, such as crawling, carrying, and climbing, as well as combinations of these (e.g. pick up dirty laundry from underneath the bed), and interact with other robots or humans (e.g. move furniture in groups). The broad nature of these tasks is extremely challenging to AI in general, let alone intelligent systems running on small embedded processors such as mobile phones. We have been competing for the last three years at major robotics competitions (RoboCup, FIRA) using humanoids whose main computational demands are supported using mobile phones. While RoboCup (RoboCup 2009) involves mainly soccer and a few challenges closely related to soccer (e.g. a ball throw-in), the FIRA HuroCup (FIRA 2009) competition is specifically designed to encourage the development of the types of broad robotic skills in which we are interested. The same physical robot must be able to partic- ipate in events ranging from basketball free-throws to obstacle course runs, to a climbing wall, taking place over extended periods of time. The computing demands to support the artificial intelligence necessary for such a range of ac- tivity (managing everything from computer vision, to active balancing and intelligent control, to localization and planning) would tax a full-sized desktop system, let alone a modern mobile phone. This paper explores our experiences with using mobile phones for supporting sophisticated real time artificial intelligence in the domain of robotic control. We begin by describing our typical research platform. Following this, we describe with issues in adapting phones for these purposes, and discuss variations in OS, IO support, and issues in software development. We then illustrate the abilities of mobile phones for AI by describing three elements of our work that are representative of the difficulty of supporting AI on such systems: real-time computer vision, localization, and. For a physical platform, we begin with the Robotis Bioloid humanoid robot kit: these provide 18 degrees of freedom, use reasonably powerful and robust motors given their cost, and are far easier to acquire and assemble than building skeletal components from scratch. The Robotis kit includes a small AVR ATMega128 embedded controller for managing the individual servos of the robot. In our work, this is only used for low-level position control of the servo motors. Figure 1 shows one of our robots, S TORM , using this platform, along with a mounted Nokia 5500 mobile phone for perception (visual feedback, active balancing sensors) and on-board computing. The main drawback of using Mobile Phones is that they provide very little IO resources. We therefore add a custom- built IrDA interface, based on the Microchip MCP 2150 IrDA transceiver, to the humanoid kit. This allows the mobile phone to control the high-level motions of the robot. While the Bioloid kit comes with firmware that can record and play back basic motions, this is not suitable for the complex motions we require, and so we replace this firmware with our own (requiring reverse-engineering part of the orig- inal Robotis firmware). The firmware also supports a 3-axis accelerometer from Analog devices, so that phones that do not have internal accelerometers can use an external sensor for active balancing. There is is a huge variety of mobile phones available on the market, and dozens more are released each year. The cost of these devices is extremely competitive compared to many embedded systems (given their speed, memory, and included sensing devices), because they are produced in huge volume. While economy of scale and the ability to have many necessary sensing devices included in an embedded system is very attractive to a researcher interested in supporting artificial intelligence and robotics on such systems, one is also well advised to heed the old motto: Caveat Emptor . Even from the same manufacturer, individual phones often have different versions of the same OS, support different exten- sions, and may sometimes run totally different OSs. The model number often confuses more than it helps in trying to decipher the OS that is run by a device. For example, the Nokia 6600 and 6680 are Nokia Series 60 devices, which is a very good OS for robotics purposes, whereas the Nokia 3300 and 3500 are Nokia Series 30 devices, which are not programmable. But the Nokia 6230 is a Series 30 device and the Nokia 3230 is a Series 60 device. It is also important to realize that mobile phone manufacturers see these phones as finished consumer products, and therefore do not expect them to be “illicitly hacked” (from their perspective) to be used as embedded control systems. At best, some manufacturers encourage the development of third-party applications, but these applications often run in a sandbox which strictly limits which hardware is accessible to the application. In spite of these hurdles, mobile phones can provide an extremely cheap development platform with high speed processing, LCD, buttons, wireless, bluetooth, infrared and one or two cameras in a very small and lightweight package. This section details our experiences with adapting these devices for robotics applications, including working with real time operating systems, developing software, and ultimately developing an IrDA interface for supporting IO. The most ubiquitous development environment for mobile devices is Java 2ME from Sun. It is available for a large number of devices and is standardized. However, J2ME re- ally only standardizes the language and some of the GUI components, as well as data structures: several key tech- nologies of interest to a researcher are only available as JNR libraries, which may or may not be supported. For example, we purchased 13 “developer phones” from Sony Ericsson in 2004. These Z1010 phones include two cameras, Bluetooth, infrared, and had external storage on a memory stick. Initial development went smoothly and even though (as expected) the frame rate of the vision processing was slow, it would have been sufficient for robotic soccer. We found out the hard way, however, that Sony Ericsson does not allow access to Bluetooth nor IrDA infrared, nor to the external memory - even for developer devices. The company also refused to allow us to return the phones once we found out their limitations. We will therefore not recom- mend Sony Ericsson phones in any way. After several failed attempts trying to use J2ME for image processing, we chose the Nokia devices that run the Symbian OS S60 development environment. The main reason for this choice was that Nokia’s SDK is more open and supports development in C++ and J2ME as well as other languages such as Python. The Symbian SDK Carbide provided by Nokia for its phones is Windows-based and uses Eclipse as an integrated development environment (IDE). The tool chain includes the GCC Arm compiler, assembler, and linker, but also several other tools to help manage various builds (emulator, debug, release) and to help in internationalization. The first tool bldmake takes as input a .bld file which specifies the source files as well as the required libraries, and generates various makefiles in several directories and the abld.bat file which is used to control the build process. abld is a generated script that allows the building of debug or release versions for real hardware or ...
Context 3
... the attached hardware. We used this scheme in our small humanoid robots S (figure 1), and R OGUE ...

Similar publications

Conference Paper
Full-text available
Magnetometers and accelerometers are sensors that are now integrated in objects of everyday life like automotive applications, mobile phones and so on. Some applications need information of acceleration and attitude with a high accuracy. For example, MEMS magnetometers and accelerometers can be integrated in embedded like mobile phones and GPS rece...

Citations

... Two demonstrators that are heterogeneous along different dimensions are also employed. The first is a humanoid robot based on a Bioloid kit, using a cell phone for vision and processing [16]. The choice of a humanoid was made because it provides an extremely different physiology from the imitator in terms of how motions made by the robot appear visually. ...
Article
Full-text available
Imitation learning enables a learner to improve its abilities by observing others. Most robotic imitation learning systems only learn from demonstrators that are similar physically and in terms of skill level. In order to employ imitation learning in a heterogeneous multi-agent environment, we must consider both differences in skill, and physical differences (physiology, size). This paper describes an approach to imitation learning from heterogeneous demonstrators, using global vision. It supports learning from physiologically different demonstrators (wheeled and legged, of various sizes), and self-adapts to demonstrators with varying levels of skill. The latter allows different parts of a task to be learned from different individuals (that is, worthwhile parts of a task can still be learned from a poorly-performing demonstrator). We assume the imitator has no initial knowledge of the observable effects of its own actions, and train a set of Hidden Markov Models to create an understanding of the imitator's own abilities. We then use a combination of tracking sequences of primitives and predicting future primitives from existing combinations of primitives, using forward models to learn abstract behaviors from demonstrations. This approach is evaluated using a group of heterogeneous robots that have been previously used in RoboCup soccer competitions. © 2012 International Journal of Automation and Smart Technology.
... Two demonstrators that are heterogeneous along different dimensions are also employed. The first is a humanoid robot based on a Bioloid kit, using a mobile phone for vision and processing [14]. The choice of a humanoid was made because it provides an extremely different physiology from the imitator in terms of how motions made by the robot appear visually. ...
Conference Paper
Imitation learning enables a learner to improve its abilities by observing others. Most robotic imitation learning systems only learn from demonstrators that are homogeneous physiologically (i.e. the same size and mode of locomotion) and in terms of skill level. To successfully learn from physically heterogeneous robots that may also vary in ability, the imitator must be able to abstract behaviours it observes and approximate them with its own actions, which may be very different than those of the demonstrator. This paper describes an approach to imitation learning from heterogeneous demonstrators, using global vision for observations. It supports learning from physiologically different demonstrators (wheeled and legged, of various sizes), and self-adapts to demonstrators with varying levels of skill. The latter allows a bias toward demonstrators that are successful in the domain, but also allows different parts of a task to be learned from different individuals (that is, worthwhile parts of a task can still be learned from a poorly-performing demonstrator). We assume the imitator has no initial knowledge of the observable effects of its own actions, and train a set of Hidden Markov Models to map observations to actions and create an understanding of the imitator's own abilities. We then use a combination of tracking sequences of primitives and predicting future primitives from existing combinations using forward models to learn abstract behaviours from the demonstrations of others. This approach is evaluated using a group of heterogeneous robots that have been previously used in RoboCup soccer competitions.
Conference Paper
Mobile gaming is an arena full of innovation, with developers exploring new kinds of games, with new kinds of interaction between the mobile device, players, and the connected world that they live in and move through. The mobile gaming world is a perfect playground for AI and CI, generating a maelstrom of data for games that use adaptation, learning and smart content creation. In this paper, we explore this potential killer application for mobile intelligence. We propose combining small, light-weight AI/CI libraries with AI/CI services in the cloud for the heavy lifting. To make our ideas more concrete, we describe a new mobile game that we built that shows how this can work.