Figure 2 - uploaded by Juan G. Victores
Content may be subject to copyright.
3: (a) The initial three-component model of working memory proposed by Baddeley and Hitch. (b) A further development of the working memory model [3]. 

3: (a) The initial three-component model of working memory proposed by Baddeley and Hitch. (b) A further development of the working memory model [3]. 

Source publication
Thesis
Full-text available
This thesis presents the Robot Imagination System (RIS). This system provides a convenient mechanism for a robot to learn a user's descriptive vocabulary, and how it relates to the world for action. With RIS, a user can describe unfamiliar objects to a robot, and the robot will understand the description as long as it is a combination of words that...

Contexts in source publication

Context 1
... To press forcefully against in order to move Force: Energy or strength Energy: Strength of force Strength: The power to resist force Figure 2.1: A network of definitions extracted from Webster's Dictionary con- taining circularities. To make use of such symbolic networks, non-linguistic knowledge is essential to ground basic terms of linguistic definitions ...
Context 2
... just a dictionary of that language (e.g. Chinese/Chinese). The trip through the dictionary would amount to a merry-go-round, passing endlessly from one meaningless symbol or symbol-string (the definiens) to another (the definiendum), never coming to a halt on what anything meant. Deb Roy refered to this as "circular definitions", and depicts an English version of this issue in Figure 2.1 ...
Context 3
... number of visually grounded systems have been developed throughout the years. They have mainly focused on generating descriptions of synthetic and real scenarios, based on previously grounded information. One of the first works in linking grounded information to language was VIsual TRAns- lator (VITRA) [43]. Dynamical situations are provided via video to VITRA, which in turn analyses and performs automatic generation of natural lan- guage descriptions for the movements it recognizes. Another approach [44] uses simple user-robot interaction and language games to conceptualize an object, though no further language grounding or inference possibilities are studied. DESCRIBER [2], see Figure 2.2, adds learning techniques to as- sign ranges of values of features to words. To achieve this task, the system is trained by manually transcribing human-made descriptions of computer- generated coloured rectangles. Then, every word is considered a potential label, and they are filtered to use only relevant ones. The system assigns a subset of features to each word. Then, an algorithm compares feature distri- butions between descriptions formed with these words. Finally, the system finds the subset of features for which the distributions are maximally diver- gent when the word is present and when it is not, and assigns these features to the word. The results achieved by this system allow generating seman- tic descriptions of objects selected by a user on a screen, including spatial relationships with respect to other objects. The system's tests account for 10 non-overlaping rectangles. The features used are: red, green, blue, hight to width ratio, area, X and Y position of upper left corner, and ratio of maximum dimension to minimum dimension. Figure 2.2: Given this task, DESCRIBER [2] could generate the description "the horizontal purple rectangle below the horizontal green ...
Context 4
... number of visually grounded systems have been developed throughout the years. They have mainly focused on generating descriptions of synthetic and real scenarios, based on previously grounded information. One of the first works in linking grounded information to language was VIsual TRAns- lator (VITRA) [43]. Dynamical situations are provided via video to VITRA, which in turn analyses and performs automatic generation of natural lan- guage descriptions for the movements it recognizes. Another approach [44] uses simple user-robot interaction and language games to conceptualize an object, though no further language grounding or inference possibilities are studied. DESCRIBER [2], see Figure 2.2, adds learning techniques to as- sign ranges of values of features to words. To achieve this task, the system is trained by manually transcribing human-made descriptions of computer- generated coloured rectangles. Then, every word is considered a potential label, and they are filtered to use only relevant ones. The system assigns a subset of features to each word. Then, an algorithm compares feature distri- butions between descriptions formed with these words. Finally, the system finds the subset of features for which the distributions are maximally diver- gent when the word is present and when it is not, and assigns these features to the word. The results achieved by this system allow generating seman- tic descriptions of objects selected by a user on a screen, including spatial relationships with respect to other objects. The system's tests account for 10 non-overlaping rectangles. The features used are: red, green, blue, hight to width ratio, area, X and Y position of upper left corner, and ratio of maximum dimension to minimum dimension. Figure 2.2: Given this task, DESCRIBER [2] could generate the description "the horizontal purple rectangle below the horizontal green ...

Similar publications

Conference Paper
Full-text available
Current robots have limited reasoning capabilities and language capabilities, making interacting with them an extremely challenging process. In the near future, we expect significant improvements in all of these areas as well as the ability to learn new tasks through natural language instruction, a capability we call Interactive Task Learning (ITL)...

Citations

... The control structures and design of existing systems build a strong case for the inclusion of architecture having semantic memory, perception, and other required modalities [3,16,21,22,27,28,[32][33][34][35][36][37]. We used the minimalistic design of the previously reported Nature-inspired Humanoid Cognitive Architecture for Self-awareness and Consciousness (NiHA) (see Figure 1) [13]. ...
Article
Full-text available
The challenge in human–robot interaction is to build an agent that can act upon human implicit statements, where the agent is instructed to execute tasks without explicit utterance. Understanding what to do under such scenarios requires the agent to have the capability to process object grounding and affordance learning from acquired knowledge. Affordance has been the driving force for agents to construct relationships between objects, their effects, and actions, whereas grounding is effective in the understanding of spatial maps of objects present in the environment. The main contribution of this paper is to propose a methodology for the extension of object affordance and grounding, the Bloom-based cognitive cycle, and the formulation of perceptual semantics for the context-based human–robot interaction. In this study, we implemented YOLOv3 to formulate visual perception and LSTM to identify the level of the cognitive cycle, as cognitive processes synchronized in the cognitive cycle. In addition, we used semantic networks and conceptual graphs as a method to represent knowledge in various dimensions related to the cognitive cycle. The visual perception showed average precision of 0.78, an average recall of 0.87, and an average F1 score of 0.80, indicating an improvement in the generation of semantic networks and conceptual graphs. The similarity index used for the lingual and visual association showed promising results and improves the overall experience of human–robot interaction.