Content uploaded by Alexandre Pitti
Author content
All content in this area was uploaded by Alexandre Pitti on Mar 31, 2018
Content may be subject to copyright.
Content uploaded by Alexandre Pitti
Author content
All content in this area was uploaded by Alexandre Pitti on Mar 25, 2018
Content may be subject to copyright.
MIRRORING MAPS AND ACTIONS REPRESENTATION
THROUGH EMBODIED INTERACTIONS
ALEX PITTI†
ERATO Asada project, JST, University of Tokyo
Tokyo, Japan
HASSAN ALIREZAI AND YASUO KUNIYOSHI
ERATO Asada project, JST, University of Tokyo
Tokyo, Japan
In this paper, we present a neural architecture aimed to reproduce the qualitative
properties of the mirror neurons system which encodes neural representations of actions
either performed or observed. Several biological researches have emphasized some of its
important aspects, for instance, the tight coupling between the sensorimotor maps, the
crucial role of timing (temporal information for encoding and detection), or the neurons
connectivity. We attempt to model these in a network of spiking neurons to learn the
accurate temporal relationships between the sensorimotor maps. After the learning, the
neural connectivity efficiently induces functional capabilities in the whole network
exhibiting statistics comparable to small-world networks (e.g., scale-free dynamics and
hierarchical organization) similar with observed evidences in the mirror neurons system.
1. Introduction
The discovery by Rizzolatti and his team of particular neurons triggering to
specific actions, whenever performed by the subject himself or observed from
someone else executing it [1], brings out the tight coupling that exists between
perception and action. This neural population –, coined by the term “mirror
neurons system” (MNS) and found in the premotor cortex,– is important since
it sheds light not only on how our actions are represented within the brain but
also how we understand those performed by others; action understanding is
hypothesized to be the first stage for infants to develop higher cognitive skills
such as social interaction and imitation [14, 15]. Recent observations of human
and monkey MNS have permitted to dress an overall good picture of its
characteristics and functionalities [1,2,14,17-19]. For instance, it fires robustly
with exact timing to executed and observed actions, even when the action
sequence's end is occluded. Nevertheless, despite the advances done, few is
† Work supported by grant from ERATO Asada project, JST.
1
known about its underlying neural mechanisms and computational principles.
As these neurons represent a very small portion within a larger population of
broadly congruent neurons (10 to 30 percent only present effective visuo-motor
congruency), some researchers question about their real significance and
contest their importance [17, 18]: How such system might support any
functional integration and rely on so few neurons? The computational models
proposed recently for reproducing its qualitative properties (cf. [14]) never
address this issue. We propose to investigate it and present our hypothesis on
how the MNS could be organized to represent actions and to fulfill its
qualitative and quantitative properties.
Our idea is as follows. To rely on so few neurons, the MNS must be critically
organized as complex systems are (e.g., [6, 16, 20]). Efficient neural
connectivity might permit to have specific neurons to support the network's
functional integration e.g., for cross-modal associations. We introduce in the
first part our motives to develop such kind of neural network that can exhibit
the same functionality as the MNS. We show that its characteristics make it a
small-world network which are networks with exceptional structure (e.g., fault
tolerance, short path length). Using this network, we reproduce then the
Rizzolatti's experiment exhibiting the mirror neurons features to trigger either
during action execution and action observation (e.g., during grasping). We
show that the network generates similar characteristics with the MNS revealing
sensorimotor coupling and multi-modal integration (e.g., the re-enaction of one
modality from experiencing the other) relying on critical neurons.
2. Motivation
The particular organization of the MNS present some similarity with those of
complex systems which may permit to understand its functioning. For instance,
mirror neurons fire to either performed or observed action but with accurate
timing (e.g., firing only at the time-to-contact during grasping). We believe
that this temporal characteristic is important since it requires efficient
information propagation and therefore efficient neural connectivity to respond
robustly. However, the neurons firing at exact timing represent a minority
within the MNS: Gallese and Rizzolatti discovered that the MNS has an
atypical distribution [1] following roughly two classes labeled “strictly
congruent neurons” and “broadly congruent neurons” with respectively 1/3 and
2/3 ratio suggesting that actions are represented by few neurons ([17, 18]
report a 1/10 and 9/10 partition). Since timing is crucial, these neurons must be
efficiently connected within the MNS in order to trigger contingently.
Therefore, these critical neurons must somehow generalize the spatio-temporal
structure of one action sequence into its action primitives ([14]). Reversely,
2
their damages can cause the degradation of the network performance and its
functional integration (e.g, one hypothesis of the cause for autism [10]). In
addition, new evidences demonstrating that actions are represented both
spatially and temporally at different description levels [2] support once more
the hierarchical nature of the MNS organization. Altogether, these
considerations suggest us that, in order to exhibit pragmatic representations
implicit from which appear the tight links between perception and action, the
MNS should follow a complex systems architecture exhibiting (i) robust and
redundant information processing relying critically on time, (ii) asymmetric
density distribution of the neurons connectivity, and (iii) hierarchical and
distributed representations.
Those properties, summarized in table 1), are an hallmark of scale-free and
small-world networks (SWN) [6, 7, 16, 20]. These are types of graph with
characteristic nodes connectivity distribution that follow a power-law
distribution and present efficient information propagation and synchronization
at different time scales (i.e., scale-free dynamics [7, 16]). A majority of units
possesses short path lengths connections with their neighbors, forming a semi-
closed cluster, whereas a minority possesses long path lengths linking those
clusters to distant ones. These special neurons represent hub connectors that
link the “small-worlds” to each other. Information exchange is particularly fast
because of the hierarchical organization that combine centralization and
distribution making the network robust to fault tolerance [20]. The suppression
of an important number of provincial units might weakly affect the network
performance whereas the suppression of the most connected ones might
drastically affect it, making them critical.
Table 1. Qualitative and quantitative comparisons between the
properties of the mirror neuron systems and of small-world networks.
Mirror neurons system Small-world networks
Tight coupling between perception and
action: critical timing [1]
Critical timing
SWN relies on few but critical
units integrating globally the
local processes.
MNS distribution [1]:
60% “coarse” neurons
30% “accurate” neurons
Autism: no global integration of modal
processes [11]
“Hub Connectors”
Connectivity distribution of the
units in a SWN follow a power-
law curve.
SWN relies on few but critical
units integrating globally the
local processes.
Neural representation at different
description levels [2]
Scale-free dynamics
Information is represented in
hierarchies at multiple time
scales.
3
3. Experiments
3.1. The neural network model
We attempt to model the particular entanglement between perception and
action characteristic to the MNS for action representation (e.g., grasping a
cup). To this aim, we define two networks to receive respectively the visual
information from the camera and the somatosensory input from a haptic device
(the force feedback of our fingertips), see Fig. 1. Each neuron of the visual
map is associated to one pixel value. As the camera resolution is reduced to
60x90 pixels, 5400 neurons constitute the visual map. The somatosensory map
has for itself 1000 neurons, each ones associated to a spatial location of the
haptic device [5,9]. The neurons, all excitatory, are defined by the formal
model proposed by Izhikevich [11, 12] to which we add 2000 inhibitory
neurons in a separated hidden layer to stabilize the overall system. The initial
neural network is to a standard random graph so that the node connectivity has
a uniform distribution: each neuron, either excitatory or inhibitory, has 100
synaptic connections equally weighted to other ones randomly selected over
the whole network. Therefore, the neural pairs within the same map support
the intra-map information processing (specialization) whereas the distant
neural pairs support the inter-map information processing (integration). Before
learning, these neural associations don't correspond to any particular
sensorimotor patterns; the mechanism of spike timing-dependent plasticity
(STDP) [4] regulates then the learning by updating the synaptic weights.
Figure 1. Schematic of the experiment. The network receives the co-occurent visuo-tactile inputs
during grasping from the camera (bottom-right corner) and from the pressure sensitive device (in the
upper-left corner).
4
3.2. The learning procedure
The first stage consists of repeated grasp of the tactile device. During
execution, the neurons of each map learn the invariant contingent relations
between their siblings within the same map (intra-map specialization) and of
the other map (inter-map integration). During time, they assemble themselves
to form robust spatio-temporal clusters between the vision map and the
somatosensory map. The result is such that, after the learning period, if we
reproduce again the grasping sequence, the network anticipates this time the
exact time-to-contact: the specific somatosensory pattern is activated before
the effective contact within dozens of milliseconds in advance (see Fig. 2).
Figure 2. Neural dynamics of the visuo-tactile maps during physical interactions; the blue dots
represent the neurons spikes. In red (resp. in cyan) the synaptic activation from the neurons of the
vision map (resp. the somato map). The retina anticipates the perceptual stimulus in the
somatosensory map before the time-to-contact. At the time-to-contact, the somatosensory map
generates in return a global activation.
5
This phenomenon, described by Berthoz as “anticipated touch” [14], shows
how the neurons of the visual map literally simulate the activity of the tactile
modality with precise timing. The tactile and the vision modalities get
intertwined reproducing the mirroring effect of the F5 area's neurons. It
follows that, when we effectively touch the device at the time-to-contact
(receiving tactile stimuli), it is then the tactile map that generates a global
activation in the whole network. The two modalities are such functionally
integrated that one perceptual stimulus can activate (or simulate) the modality
missing. Such case occurs for example when we grasp an object with closed
eyes and mentally reconstruct its shape from our touches (tactile → vision).
Or, when observing someone else grasping, we mentally reconstruct the
respective tactile information (vision → tactile).
3.3. Functional comparison with the MNS
To fulfill the comparison with the MNS, we test the network's response in the
conditions of action observation i.e., we provide to the network only the visual
stimuli (see Fig. 3).
Figure 3. Neural activity during observing a grasping sequence (no tactile information received). At
the time-to-contact and during handling, the visual map, without tactile information, re-activates
nevertheless the somatosensory activity as during enaction.
6
Without tactile stimuli, the network nevertheless re-activates the same neural
pathways as during enaction (see Fig. 3) and reconstructs the missing modality
from only the visual information: the tactile information is perceived from the
visual information (red links) and fires back (cyan links). Their dynamics thus
reverberate each other in a coherent fashion through a resonant-like process at
precise timing. It assesses the tight coupling between the two modalities and
suggests that the network is efficiently organized.
4. Analyzing the dynamics of the neural network
We analyze the network statistics with respect to the neurons connectivity.
Fig. 4 a) presents the distribution of the neural groups size relative to their time
span. Fig. 4 b) displays the neurons density distribution relative to the number
of synaptic connections. In Fig. 4 c), we analyze the network tolerance when
confronted to an attack (pruning neurons). The first one displays the wide
distribution of the clusters temporal range whereas the second shows the
power-law distribution of the units connectivity. These two graphs explain how
actions are represented inside the network as spatio-temporal clusters at
multiple time scales and at different hierarchical levels.
Figure 4: Clusters statistics. Density distribution of the neurons connectivity ordered by their time
span (a) [resp. the longest path of cluster defined and their time span]. The density of the neurons
connectivity in (b) follows the characteristic power law curve typical of small-world networks. The
network forms scale-free dynamics. In c), we compare the network performance (firing rate of the
somato map) depending if we prune the neurons in a random sequential order (blue) or the most
connected neurons first (red line).
7
As we introduce it in section 2, these properties are those of small-world
networks [7] and match the MNS quantitative data separating the neurons into
two classes asymmetrically distributed, those broadly congruent, the majority,
and those strictly congruent, the minority. This functional architecture is
hypothesized to imply efficient interregional communication, enhanced signal
propagation speed, computational power, and synchronizability [3, 6, 7]. For
instance, the network performance –, computed as the firing rate in the somato
map normalized between a lower and a higher limit interval,– decrease linearly
when we suppress neurons selected in an aleatory order (Fig. 4-c, blue line)
whereas the performance falls drastically if we prune the most connected
neurons first (Fig. 4-c, red line): the suppression of 37.5% of neurons randomly
selected or of the 7% most connected ones achieve the same performance
score.
The power laws curve means that neurons in a small-world network are not
completely independent from each other, and a few ones dictate the action. In
our experiment, it means that the neural network has evolved into an efficient
system shaped by the synchronized and coordinated visuo-somatosensory
stimuli. The network produces different description levels of the action across
multiple time scales assembled dynamically into short and long range clusters.
These clusters are organized around neurons highly connected (Fig. 4 b) which
articulate the scale-free temporal binding between the short-range time scale of
the neurons (millisecond order) to the long-range time scale of the
“body” (hundred milliseconds to seconds order).
Precisely, some of the neurons are found critical within the network due to
their large number of connections. Thus, not all of them have the same
importance. These particular neurons direct the neural dynamics and sustain
the network functional capabilities. It is remarkable that the network functional
integrity relies on a relatively small population of neurons compared to its
dimension: lesser than 5 percents of the neurons in the network possess more
than ten synaptic connections which represent approximately 300 neurons. We
circled in red these critical neurons in figure 5 and in black, some clusters
connecting some of them during grasping. As it can be seen from the graph,
they follow the trends of the visuo-somatosensory patterns (blue and green
lines). They represent therefore the primitives on which the dynamics of the
network are articulated and show how actions can be represented as scale-free
dynamics.
8
Figure 5. Critical neurons. We circled in red the neurons with more than ten synaptic connections
during the period of tearing [taken from Fig. 4]. They are critical for the functional integrity of the
network on which the clusters rely on. We plot also some clusters passing by some of these neurons.
The blue and green lines plot their trends, the action primitives.
5. Discussion
In this paper, we suggest the hypothesis that the mirror neurons system is
organized critically as a complex network to represent actions. Our proposal is
supported by recent findings suggesting that action representation in the MNS
are modeled at different description levels with scale-free dynamics [1, 2, 17]
relying on very few neurons [18, 19]. The neural organization of our model has
a topological connectivity similar to scale-free networks [3, 6, 7] which
permits the cross-modal integration between vision and haptic information.
Although the network is highly robust against pruning, global integration is
however sensitive to the suppression of particular neurons only highly
integrated, one hypothesis suggested for autism [10]. We expect that such
neural mechanism might provide some principles to understand how cross-
modal integration occur for action representation and action understanding
during infant development [9, 15].
Acknowledgement
We would like to thank the JST ERATO project for the support to this
research and the anonymous reviewer for his helpful comments.
9
References
1. Rizzolatti, G., Craighero, L., “The Mirror-Neuron System”, Annu. Rev.
Neurosci. , 27:169–92, 2004.
2. Lestou, V., Pollick, F., Kourtzi, Z., “Neural substrates for action
understanding at different description levels in the human brain”, Journal of
Cognitive Neuroscience 20 (2), 324–341, 2008.
3. Bassett, D., Bullmore, E., “Small-world brain networks”, The Neuroscientist
12 (6), 512–523, 2006.
4. Abbott, L.F. and Nelson, S.B., "Synaptic plasticity: taming the beast", Nature
neuroscience, (3), pp. 1178—1182, 2000.
5. Pitti, A. Alirezaei, H. and Kuniyoshi, Y., “Cross-modal and Scale-free
Action Representations through Enaction”, Neural Networks (in press).
6. Watts, D., Strogatz, S., Collective dynamics of ’small-world’ networks.
Nature 393, 440–442, 1998.
7. Buzsaki, G., Rhythms of the Brain. Oxford University Press, 2006.
8. Alirezaei, H., Nagakubo, A., Kuniyoshi, Y., A highly stretchable tactile
sensor skin for smooth surfaced humanoids. IEEE-RAS 7th Intl. Conf. on
Humanoid Robots, 512–523, 2007b.
9. Rochat, P., Five levels of self-awareness as they unfold early in life.
Consciousness and Cognition 12, 717–731, 2003.
10. Just, M.A., Cherkassky, V.L., Keller, T.A., Kana, R.K. and Minshew, N.J.,
"Functional and anatomical cortical underconnectivity in autism: evidence
from an FMRI study of an executive function task and corpus callosum
morphometry", Cerebral Cortex, 17:4, p. 951-961, 2007.
11. Izhikevich, E., Gally, A.J. and Edelman, G.M., "Spike-timing Dynamics of
Neuronal Groups", Cerebral Cortex, 14, p. 933-944, 2004.
12. Izhikevich, E., "Polychronization: Computation With Spikes", Neural
Computation, 18, p. 245-282, 2006.
13. Berthoz, A., The Brain's Sense of Movement, Harvard University Press,
2000.
14. Oztop, E., Kawato, M. and Arbib, M., "Mirror neurons and imitation: A
computationally guided review", Neural Network, 2126, pp. 1-18, 2006.
15. Zukow-Goldring, "Assisted imitation: affordances, effectivities, and the
mirror system in early language development", from Action to Language via
the Mirror Neuron System, M.A. Arbib, 2005.
16. Barabási, A.-L. and Albert, R., "Emergence of scaling in random network",
Science, 286:509-512, 1999.
17. Chong, T. Cunnington, R. Williams, M. Kanwisher, N. and Mattingley, J.
"fMRI Adaptation Reveals Mirror Neurons in Human Inferior Parietal
Cortex" Current Biology 18, 1576-1580, 2008.
18. Dinstein, I., "Human Cortex: Reflections of Mirror Neurons", Current
Biology 18, R956-R959, 2008.
19. Dinstein, I., Gardner, J. Jazayeri, M. Heeger, D., "Executed and Observed
Movements Have Different Distributed Representations in Human aIPS", The
Journal of Neuroscience, 28(44):11231-11239 ,2008.
20. Albert, R. Jeong, H. and Barabási, A.-L., "Error and attack tolerance of
complex networks", Nature 206, 2000; p. 378–382. Erratum: Nature 409; p.
542, 2001.
10