Content uploaded by Alexander Kugaevskikh
Author content
All content in this area was uploaded by Alexander Kugaevskikh on Nov 03, 2021
Content may be subject to copyright.
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
Research of a Neuron Model with Signal
Accumulation for Motion Detection
Alexander Kugaevskikh
Department of information technologies
Novosibirsk State University
Novosibirsk, Russia
https://orcid.org/0000-0002-6676-0518
Abstract— The paper presents a new model of the MT-
neuron (Middle temporal area neuron), which allows detecting
movement and determining its direction and speed, without
using recurrent connection. The model is based on the
accumulation of the signal and is organized using a space-time
vector that sets the weight coefficients. Despite the
combinatorial redundancy, it is assumed that the model is more
resistant to glare in comparison with the optical flow.
Keywords— neural network, motion detection, MT neuron,
bio-inspired model
I. INTRODUCTION
In the visual cortex, motion analysis begins in the primary
visual cortex. Although its primary function is to highlight
edges, complex cells respond to movement in a particular
direction within their receptive field. More in-depth analysis
of movement is performed in areas V3 and V5 (MT) of the
brain. Eventually, a general map of the movements within the
visual field is formed.
In computer vision, the problem of motion analysis is most
often solved by applying the optical flow equation. When
training neural networks to detect motion, we can also talk
about using the optical flow equation, or rather the underlying
mechanism for finding the direction of change in the
brightness of pixels. The approach based on the use of
reccurent links, such as GRU [1], LSTM [2], STCNN [3],
ResNet [4], STAL [5], is considered generally accepted for the
motion detection. The problem with neural networks is that if
the neural network is not trained to detect movement in a
particular direction, it will not detect it. This is especially true
for video analytics systems, if we tilt the camera by 45 degrees
and do not retrain the neural network, then the movement will
not be detected. We offer an alternative approach, using a bio-
inspired model of the neuron for motion detection. In bio-
inspired architectures, the most common model is the Higer
model [6,7], which is analyzed in detail in [8]. The Higer
model is based on the use of Gabor energy, which combines
the real and imaginary parts of the Gabor filter that are
collapsed with the image. The use of the imaginary part is
reasonable in signal processing, since it allows you to
compensate for redundancy in low frequencies, but it makes
no sense in image processing when convolving with pixel
brightness.
II. EDGE DETECTION
In our proposed model, we not only get rid of the Gabor
energy, but also construct the MT neuron in a different way to
better match the optical flow equation. The motion analysis
begins by highlighting the edges. For this purpose, we
constructed a two-layer neural network [9] based on the use of
the Gabor filter and the hyperbolic tangent as functions for
generating receptive fields of neurons.
The input of the edge selection neural network is fed the
L* channel (pixel luminance) of the image in the CIE L*a*b*
colour space [10]. Given the specifics of this colour space, we
no longer need to use the imaginary part of the Gabor filter.
For the implementation of brightness segmentation, the
use of a two-layer neural network is proposed. On the first
layer, lines of a certain orientation are highlighted. The second
layer is responsible for selecting combinations of lines,
including corners. The difference from the trained
convolutional layers, in particular the first layer of simple cells
in the neocognitron network, is the use of already pre-
configured receptive fields of neurons of the first layer, which
increases the predictability and interpretability of the results
of such a neural network.
Each layer contains 3 types of neurons that differ in the
configuration of receptive fields. In this case, the links
between the layers are organized in a special way. Each
neuron of the second layer is connected to only two neurons
of the first layer. Thus, the neurons of the second layer allow
you to select lines and angles (in the case of the Gabor filter)
and quadrilaterals (in the case of the hyperbolic tangent).
where
, responsible for symmetry of a kernel
of the filter, was entered as an alternative to quadrature pair of
filters. It can take values of
or
to detect antisymmetric
components and , 0 or for symmetric components, is
the filter scale, is the degree of the filter ellipticity (defines
the elongation of the filter kernel along the ordinate axis). The
parameter is introduced solely for the purpose of simplifying
selection of the optimal kernel without different values of
and , and can be expressed through these parameters
.
The first two types of neurons respond to lines of preferred
orientation. Their receptive fields are formed using the Gabor
filter. Neurons of the following type are required to determine
the zones of brightness difference, and a smooth function must
also be used to form the receptive field, to take into account
such moments as blurring in fog conditions and finding
shadows. For this reason, the Haar wavelet cannot be applied,
but the receptive field can be configured using the hyperbolic
tangent.
This paper was financially supported by the Russian Foundation for
Basic Research (Grant No. 19-57-45006).
The neurons of the first layer (named ) of the edge
selection neural network use a linear activation function
where – neuron’s type, –convolution
coordinates, - the pixel brightness matrix of the input image.
The neurons of the second layer use the sigmoid activation
function and function on the principle of "winner-takes-all"
(WTA)
III. MT NEURON MODEL
Motion detection sets the spatio-temporal organization of
the motion detection neural network. Movement, in this case,
is the sequential activation of several edge selection neurons
located in the same direction in a certain neighborhood over
time, i.e. with a change of frame. Thus, the MT neuron can
give the direction of movement α and its speed v. The MT
neuron, like the previous neurons, is created for each type p.
The connections of the MT neuron with the UC2 neurons of the
corresponding type determine its receptive field.
To determine linear motion, the receptive field of the MT
neuron (UMT{l}) includes a sequence of UC2 neurons in the α
direction. To determine the rotation, the receptive field of the
corresponding MT neuron (UMT{r}) is accompanied by
connections with neurons located in the same center of the
receptive field, but having different orientations θ. The
rotation detection neuron is created twice for different
directions of rotation. In the future, this will help to apply
inhibitory connections to level out parasitic activation.
The weights of MT-neurons are set using the product of
two Gaussians, the first is responsible for the spatial
characteristic, the second sets the attenuation coefficient of the
link weight over time.
A uniform filling of such a neuron, i.e. a stationary dark
area in the entire size of the receptive field, will not give the
required activation, fig.1. In this case, the attenuation
coefficient obeys a certain law of change t: at the beginning of
the movement in the receptive field , when the
neuron is activated, the vector t will have
values . At the end of the receptive field, the
vector t will have values . In this regard, the
value of the attenuation coefficient will change.
Fig.1. Scheme of the MT neuron work
IV. EXPERIMENS
For an experimental test, we will run the movements of
different angles to check the activation of UC2 neurons, in
the direction of 45 degrees. Fig. 2-4 show the frame-by-
frame activation of MT neurons. Ideally, there should be a
thin line, but due to the low resolution, there is a false
activation within 10-20 degrees ( ).
Fig.2. First frame (start motion, vertically - the angle of movement,
horizontally-the response of MT-neurons)
in
in
in
Fig.3. Second frame (vertically - the angle of movement, horizontally-the
response of MT-neurons)
Fig. 4 shows that the maximum activation is achieved on
the second frame, then there is a fade, which confirms our
assumption. Attenuation is necessary so that the MT neuron
does not fire on stationary objects.
Fig.4. Third frame (end motion, vertically - the angle of movement,
horizontally-the response of MT-neurons)
In general, the longer the movement lasts, the more
accurately its direction is determined.
If we increase the size of the receptive field of the space-
time vector from 3 to 7, the accuracy of determining the
direction of movement increases, fig.6-7.
Fig.5. Fourth frame (vertically - the angle of movement, horizontally-the
response of MT-neurons)
Fig.6. Seventh frame (end motion, vertically - the angle of movement,
horizontally-the response of MT-neurons)
Another important aspect is the response to changes in the
speed of movement within the receptive field. The increase in
speed is expressed in an increase in the activation step of the
neurons of the previous layer in the spatial relation while
maintaining constancy in the temporal relation. So, at normal
speed, the UMT value is 4.3469, when activating UC2 through
one neuron, the activation of UMT drops sharply to 0.8448,
when activating UC2 through two neurons, the activation of
UMT is 0.6883.
V. CONCLUSION
The presented model of the MT neuron does not react to
a stationary object, since the uniform filling of the receptive
field of such a neuron gives the output value as close to zero
as possible. The movement is encoded by its direction and
speed. Neurons with different receptive fields are responsible
for the direction, and the speed can be determined based on
the output value of such a neuron.
The best accuracy in determining the direction of
movement can be obtained with the size of the space-time
vector is (7*7,7).
The experiments have shown that the proposed model of
the MT neuron responds to movement in the expected way.
REFERENCES
[1] Y. Cai, J. Liu, Y. Guo, S. Hu, and S. Lang, “Video anomaly detection
with multi-scale feature and temporal information fusion,”
Neurocomputing, vol. 423, pp. 264–273, Jan. 2021, doi:
10.1016/j.neucom.2020.10.044
[2] R. Szeto, X. Sun, K. Lu, and J. J. Corso, “A Temporally-Aware
Interpolation Network for Video Frame Inpainting,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 42, no. 5, pp. 1053–1068, May 2020,
doi: 10.1109/TPAMI.2019.2951667
[3] C. Jing, P. Wei, H. Sun, and N. Zheng, “Spatiotemporal neural
networks for action recognition based on joint loss,” Neural Comput &
Applic, vol. 32, no. 9, pp. 4293–4302, May 2020, doi: 10.1007/s00521-
019-04615-w
[4] R. Xu, X. Li, B. Zhou, and C. C. Loy, “Deep Flow-Guided Video
Inpainting,” in 2019 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), Long Beach, CA, USA, Jun. 2019, pp.
3718–3727. doi: 10.1109/CVPR.2019.00384
[5] G. Chen, J. Lu, M. Yang, and J. Zhou, “Spatial-Temporal Attention-
Aware Learning for Video-Based Person Re-Identification,” IEEE
Trans. on Image Process., vol. 28, no. 9, pp. 4192–4205, Sep. 2019,
doi: 10.1109/TIP.2019.2908062
[6] D. J. Heeger, “Model for the extraction of image flow,” J Opt Soc Am
A, vol. 4, no. 8, pp. 1455–1471, Aug. 1987, doi:
10.1364/josaa.4.001455.
[7] E. P. Simoncelli and D. J. Heeger, “A model of neuronal responses in
visual area MT,” Vision Research, vol. 38, no. 5, pp. 743–761, Mar.
1998, doi: 10.1016/S0042-6989(97)00183-1.
[8] M. Chessa, S. P. Sabatini, and F. Solari, “A systematic analysis of a
V1–MT neural model for motion estimation,” Neurocomputing, vol.
173, pp. 1811–1823, Jan. 2016, doi: 10.1016/j.neucom.2015.08.091.
[9] A. V. Kugaevskikh and A. A. Sogreshilin, “Analyzing the Efficiency
of Segment Boundary Detection Using Neural Networks,”
Optoelectron.Instrument.Proc., vol. 55, no. 4, Art. no. 4, Jul. 2019, doi:
10.3103/S8756699019040137.
[10] ISO 11664-4:2008. Colorimetry. P. 4: CTE 1976 L*a*b* colourspace.
2008–11–01.