Xiaoyu Chen's research works | Tianjin University of Finance and Economics and other places

Pedestrian X changes her trajectory in advance, due to the influence of the neighbor Y who is far away but moving fast

The architecture of the crowd interaction residual attention network (CIRAN). Our model contains four key components: sequence feature extractor, velocity coding module, spatial and velocity feature extractor and feature integration module. Firstly, the location displacement of input trajectories is extracted by the sequence feature extractor, which based on GRU. At the same time, the coordinates of all pedestrians in the scene at frame t − 1 and frame t are input to the velocity coding module, to obtain the velocity cosine similarity matrix and pedestrian distribution matrix. Then, the velocity cosine similarity matrix is merged with the pedestrian distribution matrix by the Spatial and Velocity Feature Extractor, contributing to the following feature extraction of residual attention module

Velocity coding module. The right part illustrates the spatial distribution between pedestrian 2 and his neighbors. This module calculates the average cosine similarity of every grid. It is inspired by the idea of spatial pyramid pooling (SPP), which extracts and processes scene features of images with different sizes. Through the cosine similarity matrix of velocity, it converts the influence of all neighbor pedestrians on target pedestrians, into the influence of all spatial grids in the scene on target pedestrians

The architecture of spatial and velocity feature extractor. It learns from residual attention network [35], and uses three hyper-parameters for the design of attention module: p, t and r. We choose the following hyper-parameters setting: {p = 1, t = 2, r = 1}. The number of channels in the soft mask Residual Unit and corresponding trunk branches are the same

CIRAN: extracting crowd interaction with residual attention network for pedestrian trajectory prediction

September 2022

54 Reads

International Journal of Machine Learning and Cybernetics

Shang Liu

Xiaoyu Chen

Hao Chen

This paper proposes a new deep learning network based on the spatial attention mechanism—crowd interaction with residual attention network (CIRAN), which combines the position and velocity information of neighbor pedestrians for trajectory prediction. It adaptively selects the most effective areas of the scene by using the residual attention module to obtain more accurate and reasonable pedestrian trajectories. Therefore, the accuracy of prediction can be improved. In addition, the velocity encoding module is introduced to transform the coordinate based pedestrian social interaction process into the spatial grid based pedestrian social interaction process. Based on two public data, ETH and UCY, this paper obtains the most advanced experimental results up to now, and these results show the validity of the proposed CIRAN.

View access options

Xiaoyu Chen's research while affiliated with Tianjin University of Finance and Economics and other places

What is this page?

Publications (1)