Autonomous cognitive developmental models of
robots-A Survey
Ke Huang, Xin Ma*, Guohui Tian,Yibin Li
School of Control Science and Engineering
Shandong University
Jinan, China,,
Abstract—This article intends to provide an overview of the
state of art in developmental models of cognitive robots. With the
development of artificial intelligence, robots have been able to
perform a variety of complex tasks controlled by human.
However, it is still a challenge for robots that they can explore
and develop their cognitive ability in the specific environment
like human beings. The current researches and experiments
endow robots with some cognitive abilities such as identifying
their own bodies, exploration behaviors, forming concepts etc.
We propose to analyze patterns of robotic cognitive development
and models in cutting-edge researches, and discuss advantages
and disadvantage of these models. Consequently, we can draw
some conclusions that may guide a way for future research.
Keywords—autonomy; robotics; learning; cognitive pattern;
development models
In recent years, the huge leap in the data processing speed
has contributed to the rise of high-tech sector, such as Big Data
and cloud computing. It also revitalizes the artificial neural
network which once fell into the bottleneck of development
due to the limited computing power. Artificial intelligence (AI)
has been booming. In the AI, the important direction is
cognitive robots which possess the ability of autonomous
cognitive development. As Aly said, cognitive robots can
perceive the environment, act, and learn from experience to
adapt their generated behaviors to interaction in an appropriate
manner [1]. The significance of cognitive robots is not only to
establish artificial cognitive system that can interact effectively
with humans, but also to help to verify the results in
developmental psychology and neuroscience [2].
At present, the relevant researches have made great
achievements. Nakajo et al. [3] let the robot accurately identify
their own body parts from the external environment. Meola et
al. [4] referred to the interplay of rhythmic and discrete
movements. And the model allowed robots to learn exploration
and develop the basic actions, such as reaching and grasping.
Gemignani et al. [5], Olier et al. [6] and Aguilar et al. [7]
established different conceptual representations for robots to
learn and identify objects. These abilities are the prerequisites
for robots’ continuous development. Robots could learn
language and implement complex behaviors in [8, 9, 10], and
even developed simple social skills in [11]. Developing the
ability of cognition requires establishing an appropriate model.
Vernon et al. [2] analyzed many studies and summarized ten
desiderata for developmental cognitive architectures to guide
for the model designing. However, the cognitive structure is
not a simple combination of several desiderata or functional
modules. It is a complex system that generates and coordinates
different cognitive functions. The current researches
established developmental models based on the theories of
psychology, neuroscience and computer science to develop
different cognitive behaviors of robots.
Thereby this paper intends to provide a deeper insight in the
developmental models for cognitive robots. We present a
subjective analysis for current researches to seek a more
appropriate modeling approach. The rest of paper is structured
as following: Section 2 presents a detailed analysis of
developmental patterns, Section 3 provides a summary of the
current methods to establish developmental model, Section 4
discusses the two parts above, and Section 5 draws conclusions
that may suggest future successful research endeavors.
The human cognitive development is an emergent process.
People interact with other persons and the environment to
accumulate their knowledge. Gradually, they can develop their
own actions and the ability to predict events [2]. However, the
artificial intelligence is created by human. Therefore, the
cognitive developmental pattern of robots may have many
forms. The pattern not only provide a concrete framework for
establishing developmental models, but also define explicit
assumptions of cognitive models [2].There are two patterns of
current cognitive researches.
One pattern is on the basis of pre-defined knowledge or
rules. The knowledge which robots learn is gradually
increasing in line with special structures or meanings. For
example, Aguilar et al. [7] defined the knowledge structure as a
combination of object’s characteristics, emotion, expectation,
and action. The environment representation described by
Gemignani et al. [5] has two parts. The first part is a
knowledge base. The second is defined in the form of semantic
grids. Besides, Prak et al. [8] predefined the rules for action
switching in the experiment. This approach is easier to
implement. However, the significant drawback is that it limits
the learning ability for robots. For instance, robots can only
learn in a particular situation or complete a specific task.
Therefore, robots developing in this pattern are not completely
The other is emergent paradigm. Olier et al. [6] thought that
the representation of the concept should be the result of the
action-environment coupling and dynamic, rather than
classification of objects defined by humans. This pattern is
closer to the way of human developing. Robots can develop
autonomously from a primitive state to a fully cognitive state
over time. The internal and external motivations drive robots to
explore and learn the world. They can perceive, act, adapt and
anticipate. Meanwhile, robots continually accumulate the
knowledge as their own experience [2]. It allows robots to
develop different cognitive behaviors without too many
We describe the whole process of cognitive development
for the second pattern of robots in Fig. 1. Robots develop from
a primitive state. Just like infants’ instinct, robots should be
endowed with some necessary attributes before development.
These attributes include robots’ interest, attention, affect,
knowledge structure etc. When robots begin to develop, they
firstly identify their own bodies to distinguish themselves from
the environment. Then driven by motivations, robots learn in
the environment though perceiving with multimodality and
exploring with actions. And the results of learning are used to
form or update knowledge and skills which will be stored in
memory. At the same time, these knowledge and skills can also
be reused in the learning process. After development, robots
can autonomously interact with environment. As for the way of
learning, we have summarized several models in the Section 3.
Fig. 1. Procession of cognitive development of robots.
The development model endows robots with learning
capacity and determines their learning method. That is to say, it
determines the way of representing knowledge and learning
from the interaction with environment. Thus the development
model plays an important role in robots’ cognitive development.
It is established according to learning mechanisms. One sort of
cognitive ability may have different models based on different
learning mechanisms, and the same model can also develop
different cognitive abilities. Considering this difference, we
divide the models of current researches into the following types
according to learning mechanisms and Fig. 2 classifies the
articles mentioned in this paper. In the following sub-section,
we will elaborate these different types of models.
Fig. 2. The classification of learning mechanismes for development models
A. Application of the infant's cognitive development theories
Inspired by the cognitive process of infants, some
researches applied development theories into robotics. From
these theories, researchers can understand the way of infants’
building their structures of knowledge and developing their
behaviors, language and other complex skills. Then let robots
learn like human infants.
A famous theory referred by many studies is that infant
cognitive development theory, which was proposed by Piaget.
In this theory, learning process of infant has four stages [12]. Li
et al. [12] implemented the first three stages in real robot. In
the sensorimotor stage, they chose target object which has the
largest information entropy in visual field. Then they combined
color histogram, texture histogram and location into sample-
based representation. In pre-operational stage, the robot used
support vector machine (SVM) to form single symbolic
representation based on the sample representation. And latent
support vector machine (LSVM) was used to build symbolic
representation with multiple features in the concrete
operational stage. Finally robots could develop an ability to
identify object. While Aguilar et al. [7] focused on the first and
second sub-stages of the sensorimotor stage and stipulated the
interested object according to Piaget theory. The structure of
knowledge consists of many schemas which contain the sense
of object and related action. And the computational model is
Developmental Engagement-Reflection (Dev E-R). Namely
robot automatically generates knowledge in the engagement
phase and then analyzes, evaluates, processes the unknown
situations in the reflection phase. The whole experiment was
executed in a virtual scene. The robot could decide how to
operate an object according to their own emotions and produce
cognitive curiosity. These two articles are both able to let robot
identify object accurately though exploration actions, but the
idea of development is different. In the first one, the
representation of knowledge has evolved from simple to
complex and from concrete to abstract during the development.
In the second article, the evolutionary method is expanding and
modifying knowledge constantly. We think that we should
combine these two ideas. In other words, the development of
knowledge should start with simple and concrete forms. After
accumulated to a certain quantity, the representation of
knowledge will realize a qualitative leap and become complex
and abstract. However, transforming these two processes
automatically by robots is still an unsolved question.
Other researches tend to develop robots’ behaviors on the
basis of infant’s action development. Luo et al. [13] researched
the development of robots’ basic abilities according to the
Corbetta’s new psychology discovery. When robot randomly
moved its arms like infant’s babbling actions, they used auto-
encoder to learn the sense of joints and arm form the original
raw data. Then a feedforward neural network (FNN) was
employed to map the sense of arm joint angles to the sense of
arm position and orientation. And the reverse direction
structure finally chose hierarchical inverse model (HIM) after
compared with other two different models. Thus robot could
imitate the three stages of infant reaching process. Nishide’s
research [14] also established the relationship between visual
information and proprioception through babbling which
inspired by the learning process of drawing. Then it used
Multiple Timescales Recurrent Neural Network (MTRNN) to
learn action sequences. The main idea of establishing
movement developmental models is mapping joint angles to
joint position. Then robots learn action primitive though
exploring in the environment by themselves or being taught by
human partners. When encountering an unknown situation,
robots can produce new actions based on accumulated
experience. According to Schillaci et al. [15], there are two
methods of generating exploration behaviors. The first one is
randomly motor babbling, the other one is goal-direction
exploration, such as curiosity. However, goal-direction
exploration behaviors are more efficient than random
exploration strategies. It is also an important direction for
researchers to find out how to generate goals to drive actions.
Indeed, there have been many studies devoted to this direction.
B. Learning from sensorimotor experience
Robots acquire knowledge of the world in the interaction
with environment. Some studies used sensorimotor experience
to establish developmental model, and predicted the next action
by historical behaviors. . For instance, MTRNN [9, 16] is
widely used in the development of behaviors and language. In
this model, sub-network with different time constants can self-
organize the characteristics with different time scales. The
network inputs historical visual information and proprioception,
and predicts the internal state and ontology perception at the
next time. The model controls the robot’s next action based on
the predicted value. Backpropagation through time (BPTT)
algorithm is used to get the optimal model though minimizing
the prediction error in training process [16]. Park et al. [8] built
a model with four time scales sub-network. ,n this model, the
fast dynamic sub-network learns to encode action primitives,
and slow sub-networks connect these action primitives to
produce complex actions. Tain [9] came up with a recurrent
neural network with parametric biases (RNNPB) and
parametric biases (PB) represents intentional variable in this
network. Top-down process generates actions on the grounds
of intention, and bottom-up process minimizes the prediction
error to update the network’s weight and intent state. Two
processes continually interact with each other and can achieve
to switch intentions of actions. What’s more, an association
study with language and behavior can be implemented by
binding the PB value in language RNNPB model and
behavioral RNNPB model. This idea can be applied into the
study of multimodal fusion. Besides, Yamada et al. [17]
utilized MTRNN to integrate language and behaviors. But
RNN is more suitable for processing linguistic sequences or
action sequences [8]. In order to make better use of visual
image information, Hwang et al. [18, 19] proposed a Visuo-
Motor Deep Dynamic Neural Network (VMDNN) combined
the advantages of different neural networks. They used
MTRNN to deal with the action sequences and control
attention. In the meanwhile, they used Multiple Spatio-
Temporal Neural Network (MSTNN) to process spatio-
temporal image information. One of the significant advantages
of this model is selecting a suitable network for different
Another model is deep generative model which combines
the inference ability of the probability model and the
generalization ability of the neural network. Olier et al. [6]
used internal activation in MTRNN as probabilistic states and
used the prediction error as input to generate actions. Besides, a
core part of the model adopts variational recurrent neural
networks (VRNN). The VRNN adds temporal dependencies to
variational auto-encoders to address the dynamic updates of
internal states. The advantage of this model is obviously.
Robots can independently explore the environment without any
prior knowledge, and obtain a dynamic conceptual
representation in the form of an action-environment coupling.
However, during the learning process, there are no internal
demands and goals to drive robot move and interact with the
Sensorimotor experience can also allow robots to develop
an ability to identify their bodies. It needs robot forms body
representations. Visuo-motor coordination is widely used.
Nakajo et al. [3] came up with a stochastic continuous time
recurrent network (S-CTRNN). This network is arm at
predicting uncertainties of the sensory inputs and computing
the correlation coefficients between the moving joints and the
object in the view. Saegusa et al. [20] also judged the
correlation between visual motion and proprioception. The
model can identify multiple body parts and even the body
extended by tools. Schillaci et al. [15] summarized a number
studies showing other methods, such as using linguistic labels
to body postures, visual and auditory modalities, self-
organizing of multiple modalities etc.
C. Simulation of brain mechanism
The brain is the organ controlling all the cognitive activities
of humans. There have been many studies about brain
mechanism. Some researches applied some achievement of
brain in cognitive robots. The computing models are mainly
based on brain’s working principle and memory respectively.
Inspired by perceptual control theory, Franchi et al. [21] put
forward the Intentional Distributed Robotic Architecture
(IDRA) to imitate the principle of three brain areas. This model
is an open network deliberative modules (DM) composed by a
working-memory (WM) and a goal-generator module (GM).
WM represents cerebral cortex, and it used to receive
information from other DMs. What’s more, WM can
implement unsupervised categorization, and return the cortex
activation in response to actual sensorial input. GM acts as
thalamus and can general new objectives though Hebbian
learning. Then instincts module (IM) performs as amygdala,
and broadcasts its signal to DMs. These modules cooperate
with each other and autonomously generate new goals to
promote robot’s cognitive behaviors development. Hwang et
al. [18, 19] utilized Prefrontal Cortex (PFC) network to
integrate MTRNN and MSTNN. Hwang et al. [18, 19] utilized
Prefrontal Cortex (PFC) network to integrate MTRNN and
MSTNN. The PFC network is similar to human prefrontal
cortex. The model uses visual information to generate action
and control to switch attentions. That is to say, it can achieve
the coordination of cognitive processes. In the aspect of brain
memory, Salgado et al. [22] analyzed the principle of long and
short time memory and then built the Multilevel Darwinist
Brain (MDB) evolutionary cognitive architecture. It has three
asynchronous time scales including execution, learning and
knowledge consolidation. In learning process, World Model of
the external perception and Internal Model of the internal
perception generate new behaviors. The system selects the
behavior that can maximize the value of the satisfaction in
Satisfaction Model. The best models are saved as experience in
long-time memory. In execution scale, the system will execute
the most appropriate behavior selected from memory and the
new generated. There is an assessment structure that helps to
make the best action strategies. A further research of Salgado et
al. [23] used a simplified MDB model added by predictive
model and the motivational model. Actions are generated under
the guidance of internal motivation.
At present, many unsolved secrets also exist in brain, and
the study on the brain mechanism is one of the hot research
fields now. The significance of this kind of model can not only
promote the cognitive development of the robot, in turn, the
robot cognitive development research will also help researchers
understand the working principle of the brain.
D. Other Approaches
The above researches have made great achievements in the
direction of robots’ cognitive development, and most of the
approaches mentioned above focused on Neural Network.
Nevertheless, a few exceptions adopted different thoughts in
parallel with or inspired by these endeavors. For example, in
robot cognition, Ramik et al. [24] considered curiosity as a
drive for knowledge acquisition on both perceptual and
semantic level. The system was based on visual saliency
principle and salient objects’ detection. It used a multi-layer
perception structure to control visual attention parameter. Then
the human gave the utterance of salient objects. They searched
an appropriate belief value through genetic algorithm to
minimize the difference between the robot’s interpretation and
the utterance given by the human. Robots combined
observations and the interpretation as the knowledge of the
world. According to the similar idea, Gemignani et al. [5]
presented a method for incremental online semantic mapping
which let robot understand the surrounding environment. They
built a four-layered environmental knowledge representation.
In this representation, the environment was expressed as grids
and the objects were marked with symbols. Robots extracted
the feature of each object by image processing algorithm. At
the same time, humans labeled objects and areas observed by
robot through natural language. However, the limitations of
these two approaches are the pre-defined knowledge, rather
than acquired by the robot completely. Taniguch et al. [25]
proposed nonparametric Bayesian spatial concept acquisition
method, which allowed the robot to obtain toponym from the
utterance of the sentence and use the acquired spatial concept
to reduce the uncertainty of the self-localization effectively.
Compared with the first two methods, it is more autonomous
for robots. Besides, Best et al. [11] described a learning
algorithm to cluster social cue observations and defined an “N-
Most likely State” for each cluster. Consequently, it can realize
the development of the robot’s social awareness and identify
human emotions.
Cognitive developmental patterns and models were
discussed in Sections 2 and 3. When we set about doing
research, developmental pattern is the first thing we have to
consider, as it determines the thought of development. The first
pattern is easier to implement, due to the pre-defined
knowledge and rules. Robots will understand them with
specific meanings, whereas they may have different meanings
in other situations. Thus robots may only work in a particular
area or perform special tasks, which greatly limits the cognitive
ability of robots. The second pattern allows robots to explore
and learn in different environment for acquiring knowledge, or
utilize their own experience to deal with new situations. The
meanings of knowledge and actions are built by the robot itself.
Therefore, robots can be more independent.
Table 1 summarizes the studies that are cited in this work,
and for each of them it points out developed skills, model and
basis. We can see that models with different learning
mechanisms can develop the same cognitive ability, such as the
basic exploration actions. But each mechanism has its own
development emphasis, and will give us different inspirations.
According to the emergent paradigm, the cognitive
development of robots starts from an original state. The
infant’s cognitive development theory provides an explicit
thought and an initial sate. Most studies referred to infant’s
phased development process, and they built models to imitate
the special development task in each stage. Another significant
mechanism is curiosity. Curiosity can drive humans to explore
new knowledge and it belongs to internal goals. Besides,
external goals generated in interaction dive humans execute
tasks. Internal and external goals are both needed for robot
development. While the learning from sensorimotor experience
focuses on develop concrete abilities, such as learning
behaviors and language. This type of model can integrate a
variety of modalities including vision, audition and actions, so
that robots can develop more complex cognitive abilities. As to
brain mechanism, studies mainly utilized the coordination of
the brain. Robots can also use a brain-like network to
coordinate a variety of cognitive behaviors to achieve the
smooth switching and cooperating. In addition, this
coordination ability may be a solution for robots to achieve the
autonomous transition from one stage to another. Other
methods also have a lot of innovations, and can give us some
guidance in specific development tasks.
Study Abilities developed Pattern Model Learning mechanism
Nakajo et al. [3] Identifying body Emergent paradigm S-CTRNN Predictive learning
Meola et al. [4] Primitive actions Emergent paradigm Reinforcement learning Rhythmic and discrete movement
of infants
Gemignani et al. [5] Knowledge representation Pre-defined knowledge Semantic mapping Other Approaches
Olier et al. [6] Concept representation Emergent paradigm VRNN Learning from sensorimotor
Aguilar et al. [7] Building new behaviors Pre-defined knowledge Dev E-R Piaget’s theory
Park et al. [8] Identifying humans gestures and
Imitating actions Pre-defined rules MTRNN Learning from sensorimotor
Shifts of action intention,
language-action associations, and
learning of goal-directed actions
Emergent paradigm RNNPB, MTRNN Learning from sensorimotor
Maniadakis et al. [10] Rule switching and confidence Pre-defined rules CTRNN Brain mechanism
Best et al. [11] Social signal detection Pre-defined knowledge N-Most Likely States Other Approaches
Li et al. [12] Recgnization of objects Emergent paradigm SVM and LSVM Piaget’s theory
Luo et al. [13] Reaching Emergent paradigm Auto-encoder, FNN,
and HIM Corbetta’s psychology discovery
Nishide et al. [14] Drawing imitation Emergent paradigm MTRNN Human’s drawing skills
Yamada et al. [17] Integration of language and
behavior Pre-defined knowledge MTRNN Learning from sensorimotor
Hwang et al. [18, 19] Seamless integration of cognitive
skills Emergent paradigm VMDNN Visuo-motor coordination
Saegusa et al. [20] Perception of the self and
primitive actions Emergent paradigm Multi-layer perceptron Visuo-motor correlation
Franchi et al. [21] Learning new goals and
composing new behaviors Emergent paradigm IDRA Brain mechanism
Salgado et al. [22, 23] Learning procedural
representations Emergent paradigm MDB Brain mechanism
Ramík et al. [24] discover autonomously objects
and learning new knowledge Pre-defined knowledge Multi-layer perceptron
and genetic algorithm Other Approaches
Taniguchi et al. [25] Spatial concept acquisition Pre-defined knowledge
Bayesian spatial
concept acquisition
Other Approaches
Baxter et al. [26] Human–robot interaction Pre-defined knowledge Distributed Associative
Interactive Memory Other Approaches
Pointeau et al. [27] Accumulating and consolidating
experience Pre-defined knowledge Autobiographical
memory Brain mechanism
Cervantes et al. [29] Planning and decision-making Emergent paradigm Bio-inspired
computional model Brain mechanism
Reder et al. [30] Learning behaviors Pre-defined knowledge Case Based Reasoning Brain mechanism
V. C
The present study presented a constrained review on
autonomous cognitive developmental models of robots in
current literature. It analyzed two developmental patterns, and
summarized prevalent modeling methods. Based on this
information, we can find some inspiration for our future
In our opinion, emergent paradigm is a more promising
research direction for cognitive development. Robot should
have the ability to develop independently. Pre-defined
knowledge or rules maybe limit this ability. In the future
research, we will follow the idea in Fig. 1, and firstly consider
how to build the primitive state of robots. We should figure out
the optimal curiosity approach for robots’ interest and attention.
What’s more, we are going to add emotional factors in
primitive state, for affect can help robots make decision in
learning process. Then, when robots start developing, we also
need to find a method of identifying robot’s body. As for
learning models, each approach will give us some
enlightenment. We plan to combine the advantages of each
model. Specifically, developmental framework of robots can be
established according to the process of infant’s development. In
the concrete realization of each stage’s cognition, we can build
artificial neural network to learn knowledge and behaviors
from sensorimotor experience. Especially, deep learning has
achieved good results in solving the dynamic time sequence
problem. To coordinate various cognitive behaviors, we need a
structure similar to the cerebral cortex. In the meanwhile, we
also require a memory to save knowledge. Some articles even
proposed to let robots have the ability to dream. One way is to
consolidate what have learned [27], and another way is to
create new knowledge in dream based on existing knowledge
[28].Additionally, most researches focused on combining
visual and proprioceptive information, or associating language
and actions. However, there are few work consider to integrate
all three modalities including vision, auditory, actions.
Referring to Hwang’s model, we plan to add a MTRNN to
process auditory information and generate speech signals.
