Content uploaded by Meng Zhang
Author content
All content in this area was uploaded by Meng Zhang on Jul 02, 2022
Content may be subject to copyright.
Cooperative Behaviors of Connected Autonomous Vehicles and
Pedestrians to Provide Safe and Efficient Traffic in Industrial Sites
Meng Zhang1, Alexandre Brunoud1, Alexandre Lombard1, Yazan Mualla1,
Abdeljalil Abbas-Turki1, Abderrafiaa Koukam1
Abstract— The technology of Connected and Autonomous
Vehicles (CAV) is a hot topic of transportation systems, es-
pecially regarding platooning and the interaction with other
road users. Considering traffic safety, many studies have been
devoted to the exchange of information among various road
users, such as CAVs and pedestrians. In a platooning scenario,
when a pedestrian is detected by a CAV, the leader CAV
shares the information with its followers to provide a safe
and courteous environment thanks to its connectivity. However,
the possibility to improve traffic efficiency while meeting the
safety requirements has rarely been addressed in current
research. Yet, in industrial areas, where automated vehicles and
pedestrians frequently interact, combining safety and efficiency
is crucial. The present paper addresses this challenge by first
analyzing the intersection of CAVs and pedestrians in no-
traffic-signal scenarios. The optimal state is proposed to reduce
the time loss. Then, the paper uses a reinforcement learning-
based method to make CAVs arrive at the optimal state, to
improve traffic efficiency. The experimental results based on
virtual reality show that the proposed method increases traffic
efficiency while ensuring traffic safety.
I. INTRODUCTION
With the increase of Advanced Driver Assistance Systems,
the prospect of a truly autonomous vehicle has become more
concrete. In response to it, a brand-new domain has emerged,
called Connected and Autonomous Vehicles (CAV). This
technology aims at improving the actual transportation sys-
tem in terms of safety and efficiency. Indeed, researches show
that CAV can reduce travel time and energy consumption [1]
[2].
Autonomous and connected vehicle technologies are usu-
ally applied in closed industrial sites to ensure a stable
environment. The complexity of CAV tasks can then be
gradually increased. Demands for both traffic safety and
traffic efficiency are highly considered. For the first consid-
eration, many studies focus on the interaction between CAVs
and other road users, such as vulnerable pedestrians. In the
scenario of interaction without traffic light between CAVs
and pedestrians, safety issue is still challenging. Popular and
original solutions are enhancing the information exchange
between CAVs and pedestrians. The research in [3] [4] draws
on the importance of pedestrian-driver interaction for traffic
safety. The interactive ways proposed by many researchers
are summarized in [5]. Some pedestrian behaviors are studied
in [6] [7] during the crossing intersection area.
*This work is supported by China Scholarship Council
1The authors are with Connaissance et Intelligence Artificielle Dis-
tribu´
ees, Universit´
e de Bourgogne Franche-Comt´
e UTBM, UTBM (Site de
Belfort), 90010 Belfort Cedex, France. meng.zhang@utbm.fr
Fruitful results are archived on this topic. Based on the
success of these studies, it also makes sense to consider
optimizing traffic efficiency while ensuring traffic safety,
especially in closed industrial areas. However, when it comes
to traffic efficiency at pedestrian-vehicle intersections, the
result is very limited.
In our studied scenario, pedestrians waiting in front of an
intersection area (without traffic light) where are running a
stream of CAVs. They interact with each other to find an
appropriate crossing time for pedestrians and CAVs. During
the crossing of pedestrians, CAVs control their own speed
to keep a safe distance first. Then, CAVs decrease the time
loss by the control algorithm under safety conditions.
To improve the traffic efficiency, we calculate the optimal
state (position-speed) for CAVs during the pedestrian cross-
ing process. The time loss can be decreased for both CAVs
and pedestrians when CAVs achieve the optimal state. How-
ever, the pedestrian behavior is complex because it depends
on factors (cultural, environmental, personal) as studied in
[8]. Hence, it makes the control and optimization difficult.
Specifically, it is hard to make CAVs reach the optimal state
because of the uncertainty when pedestrians cross. To address
this problem, we use a Deep Reinforcement Learning (DRL)
method to train the CAV actions to reach the optimal state.
The Deep Deterministic Policy Gradient (DDPG), which is
a model-free off-policy algorithm for learning continuous
actions, is selected to tackle the problem with the demand
for continuous observations and action space. This kind
of algorithm has the advantage of dealing with long-term
rewards.The reward function is built to optimize both safety
and efficiency issues.
To obtain more realistic experimental results, the paper
uses a new experimental method based on the Oculus Quest
helmet, which is a virtual reality platform. It allows us to test
our model on real pedestrian behaviors in a more realistic
way. The real testers enter a virtual scene where exist a
stream of running CAVs. The testers try to cross a road and
interact with CAVs.
The remaining contents are organized as follows: Sec-
tion II introduces the research status of vehicle-pedestrian
interaction and deals with the intersection between them.
Section III-A presents the model and calculates the optimal
trajectory of CAV, and Section III-B constructs a DRL model
to solve the control problem of CAV. Section IV designs the
experiments by virtual reality simulation. Results based on
DRL method, Quadratic Programming (QP), and classical
way are compared and analyzed. Section V concludes the
Fig. 1. Interactive behaviors between autonomous vehicles and pedestrians.
(a) AEVITA moving eye concept [9] (source [10]), (b) AutonoMI pedestrian
detection and tracking indicator, (c) Mercedes-Benz rear-end LEDs showing
that a pedestrian is crossing in front of the car, (d) an array of LEDs
indicating yield [4], (e) Mitsubishi forward indicator [11], (f) an advisory
display for crossing [12], (g) Examples of UMBRELLIUM smart crossing
[13], and (h) Cooperative road crossing by CAV signaling system [14].
paper and introduces our future work.
II. REL AT ED WO RK
To improve traffic safety at the intersection between au-
tonomous vehicles and pedestrians, many researchers try
to strengthen information exchange. The Work in [5] sum-
marizes some methods and new conceptions of vehicle-
pedestrian interaction in recent years. Some popular ap-
proaches are shown in Fig. 1. In Fig. 1 (a), the AEVITA
system tries to fill the information exchange by giving the
autonomous vehicle the means to sense other objects in its
driving environment at the first and shows the information
to others. In Fig. 1 (b, d, h), autonomous vehicles show their
intention to pedestrians by presenting a signal light to realize
the communication between vehicles and pedestrians. In Fig.
1 (c), the vehicle prints its information to tell the following
vehicle that a pedestrian is crossing. In this way, we can gain
safety in the crossing pedestrian, avoiding dangerous overtak-
ing. In Fig. 1 (e), Mitsubishi Electric Corporation provides an
innovative directional-indicator system that illuminates road
surfaces at night to inform pedestrians and other drivers of a
vehicle’s intended path forward/backward, or when turning,
opening doors or making emergency stops. In Fig. 1 (f), a
Fig. 2. Studied scenario: the pedestrian and CAVs share the road space
”CZ”.
van representing an autonomous vehicle displays information
to pedestrians informing them when to cross a street to
study the interaction in different conditions. In Fig. 1 (g),
the intelligent road system obtains environmental information
through ground facilities, and then displays instructions on
the road to improve traffic safety and efficiency.
All of the above methods and concepts are aimed at
enhancing the information exchange between vehicles and
pedestrians or other road users to ensure the safety of
pedestrians, and fruitful results have been achieved. However,
when it comes to traffic efficiency at pedestrian-vehicle
intersections, there are few relevant studies. Traffic safety
is the first concern, while traffic efficiency remains an im-
portant and challenging issue, especially in closed industrial
areas. The research in [15] regards CAVs and pedestrians
as agents and provides us with optimal speed profiles of
both agents by using sequence modeling based on Petri-
Net and Hamiltonian analysis. However, as can be noticed,
solving the problem from the perspective of multi-agent
interaction requires strong assumptions about the pedestrian
behavior. Accurate estimations of pedestrian parameters (e
vp:
desired speed, vp: maximum speed and τp: reaction time) are
required to support the practice. In addition, some unfore-
seeable events, such as the pedestrian fall, entirely question
the used multi-agent optimal control approach. However,
this work sets the path for the optimization of the conflict
between pedestrians and CAVs, allowing gaining time for
both agents as follows:
•The CAV invites earlier the pedestrian to cross by
providing the soonest a reassuring safety margin.
•When the pedestrian leaves the road, the state of the
CAV (its position and speed) is not only safe but also
optimal to resume the journey.
In addition to these two optimization parameters, the multi-
agent optimal control exhibits a kind of courtesy between
both agents. The agent, who crosses the common road first,
speeds up a little to free the road the soonest for the second
agent. Recall that the multi-agent optimal control assumes
that both agents respect their optimal trajectory. In this paper,
we assume that pedestrian movements are random and all the
optimization burden is performed by the CAV. To this end,
DRL and QP are compared.
III. MODELING AND CONTROL
A. Optimal trajectory of CAV
Fig. 2 presents our main idea about the intersection
between CAVs and pedestrians in industrial zones. As shown,
CAVs are running at their desired speed e
vvon a single lane
road, while a pedestrian on the side of the road wants to
cross at a moment, in an area where pedestrians are allowed
to cross. It is assumed that the pedestrian’s desired speed is
e
vp. After the detection of the pedestrian, the system creates
a virtual area called ”Conflict Zone” (CZ), as shown in the
red rectangle. For safety reasons, CAVs and the pedestrian
are forbidden to be at the same time. For CAV1, if there
wasn’t enough distance to brake to let the pedestrians cross
first, CAV shows a red light, as shown in Fig. 1 (h), to
tell pedestrians to wait, and CAV1 share this message to
CAV2. In this way, CAV2 can prepare itself as early as
possible to yield the way to pedestrians. The objective is
to control CAVs’ longitudinal speed to improve traffic safety
and efficiency. Precisely, it is to avoid collision and decrease
the time lost by CAVs and pedestrians during the crossing.
For the traffic safety, the technology used in Fig. 1 (g)
can guarantee information exchange between road users. And
the speed of CAV is limited according to its distance to CZ
before pedestrians exit, as expressed in equation (1) where xv,
uv,vvare CAV position, acceleration, and speed respectively.
This means that the actual distance to CZ is no less than
the shortest braking distance plus a margin vvmultiplying a
positive gain τ.
|xv(t)|−l
2≥−v2
v(t)
2·uv
+τvv(t)(1)
Speed vvand acceleration uvconstraints are considered as
follows:
uv(t)−uv≤0 (2)
uv−uv(t)≤0 (3)
vv(t)−vv≤0 (4)
0−vv(t)≤0 (5)
Where uvand uvdesignate the maximum and the mini-
mum acceleration (deceleration), respectively. The maximum
speed is vv.
After having the above definitions, we can analyze the
efficiency in the case of pedestrians crossing first. Because
the CAV maintains a safe distance to the pedestrian when
she/he is in CZ (equation (1)), there is an optimal speed
and gap that let the CAV be the furthest from CZ after the
pedestrian exit. The CAV state before pedestrians exit CZ is
defined as S= (xv,vv). The optimal state S∗is the state that
CAV can get the biggest position when it accelerates to the
desired speed e
vv.
To illustrate the concept, Fig. 3 reveals the relationship
between the safe gap and the final position of CAV. The
optimal state S∗can be expressed by expression (6), where
variables are simplified in writing.
S∗=(u2−2uu +2ue
vv−e
v2)u
2(u−u)2−l
2,(τu−e
v)u
u−u(6)
B. DRL model
1) Network and observations: Fig. 4 shows us the process
of dealing with the intersection problem by using DRL in our
studied scenario. The cycle begins with the Agent observing
Fig. 3. The relationship between Sand the position after time e
vv
uwhen
CAVs, with all possible S, can reach the desired speed e
vv. The CAV with
optimal state S∗will run the farthest.
Fig. 4. DRL architecture for the problem of intersection between a CAV
and a pedestrian.
the Environment and receiving a state and a reward. The
Agent uses this state and reward for deciding the next action
to take. The Agent then sends an action to the Environment
in an attempt to control it favorably. Finally, the environment
transitions, and its internal state changes as a consequence
due to the previous state and the Agent’s action. Then,
the cycle repeats. Specifically, the action is the acceleration
applied to the CAV. The observations are information about
the CAV and the pedestrian.
In this article, DDPG agent is selected to tackle the
problem with the demand for continuous observations and
action space. DDPG is an off-policy method and entirely
model-free. Detailed information about the DDPG agent
and its variables can be found in [16]. For the structure
of networks, we refer to and adjust the structure presented
in Fig. 3 in [17], which is proved to be effective in car
following. The observations are ∆L(see equation (7)), xv(t),
vv(t),vv(t)−e
vv,R(vv(t)−e
vv)2dt,xp(t),vp(t), and fp(t)
(fp(t) = 1 when the pedestrian is in CZ, otherwise 0).
2) Reward function: To train the DDPG agent with the
ability to reach state S∗when pedestrians exit the road, a
reward function, given in expression (7) is defined. In this
function, reward r1represents a reward for safety constraint,
equation (1). If the CAV state fulfills the safety constraints
(∆L≥0), it gives a positive reward (0.01 ·w1) to the agent,
otherwise a negative reward. r2is the final position (when
t=tf inal , the time when CAV reaches desired speed) that
the CAV can reach. It evaluates the state of CAVs when
pedestrians exit their respective lanes. The reward r2is
higher when the state of CAVs is closer to optimal state S∗.
Therefore, this reward guides the agent to reach the optimal
state during the training. r3is to punish the agent if CAVs
Fig. 5. The training curves of the episode reward and average reward.
exceed the speed vv. In r4,ut−1is the acceleration from
the previous time step. Finally, r4is to decrease the control
effort. Finally, the full DDPG learning process can be found
in [16].
r(t) = w1·r1+w2·r2−w3·r3−w4·r4
r1=0.01 (∆L≥0) + ∆L(∆L<0);
r2=·xv(tfinal ),(t=tp,exit );
r3=vv(t)−vv,(vv(t)>vv);
r4=u2
t−1,(t≤tp,exit);
∆L=·xv(t)−l
2−v2
v(t)
−2·u+τvv(t)
(7)
IV. EXP ERI MEN TS AN D RES ULTS
A. Training setting
For the environment, The DDPG agent is realized through
MATLAB, and the DRL environment is built through
SIMULINK. Weighting factors in expression (7) are set to:
w1=50, w2=2, w3=10, w4=0.01. Other parameters are:
e
vv=10, u=2, u=−3, vv=1.2e
vv,τ=1, road width 4m.
According to (6) and the studied scenario parameters, the
optimal state S∗is (−13.64m,4.8m/s)according to expres-
sion (6). The cooperative control area is set to (−100,0).
And the crossing speed of pedestrians is set to (0,2)m/s.
We give a possibility of 0.2 to let pedestrians stop in CZ.
In this way, we can train the agent to deal with dangerous
situations. When launching a training episode, we randomly
initialize the positions of CAVs in the defined cooperative
control area, and they drive normally at the desired speed.
Fig. 5 shows the training curve, and it converges well.
B. Experiments and results
The purpose of these experiments is to test if the developed
algorithm is applied to the behaviors of real pedestrians and
the improvement of traffic efficiency. Experiments have been
conducted with a mixed reality platform. The testers play the
role of pedestrians and are asked to physically cross a road
in a virtual environment provided by the Oculus Quest. This
device ensures safety during the experiments and provides
immersive scenes to be as close as possible to the real
world. In all the experiments, testers interact with a stream of
unmanned vehicles: no visible driver on the virtual vehicle.
1) Comparison of DRL and QP: To test the effectiveness
of trained agent and its advantages, we compare CAVs
behaviors under the controller and QP. For the design of
the QP controller, we use the same speed and acceleration
constraints for CAVs, then estimate the pedestrian crossing
time, and plan the CAV state to the optimal state S∗when
the pedestrians leave the CZ. We compute a QP solution on
the pedestrian test data (at each step) and QP outputs the
value of CAV acceleration.
The result as shown in Fig. 6 (e), we see the classic
pattern of a pedestrian crossing: looking and analyzing the
CAV trajectory, starting to cross while looking at the CAV
(ensuring safety and understanding with the CAV) and finally
crossing without looking (clearing the path) [3].
We then analyze the position, speed, and acceleration
shape from both solutions and check whether the safety
constraint has been respected. In Fig. 6 (a), the CAV provides
an earlier reduced speed to invite the pedestrian to cross
as soon as possible. Indeed, DRL method can anticipate
pedestrian behavior whereas QP is only using the actual
pedestrian features. In the same way, CAV is leaving the CZ
earlier and with more speed than QP solution. We can notice
in Fig. 6 (b) that the safety constraint is respected as long
as the pedestrian is not leaving the crosswalk. Moreover,
the DRL acceleration shape is smoother. Therefore, CAV
shows impressive results with great efficiency (better than
QP solution) while maintaining the safety constraint.
2) Multiple lanes and CAVs: In this experiment, we test
our algorithm in a more complex environment where are two
lanes and 4 CAVs. The scenario and result are shown as in
Fig. 7. All the CAV act at the same time to communicate
a safe trajectory to the pedestrian. This test also shows all
the possible actions of the CAV: yield or drive through
for the first and second lanes. When the car is too close,
it accelerates (with a red signal) to shorten the pedestrian
waiting time. When the car can stop, a green signal is
displayed and CAV is slowing down until the pedestrian
leaves the crosswalk. The test is conclusive: the CAVs are
close to the optimal state before the pedestrian exit, which
ensures their safety while performing well.
3) Comparison of DRL and classical way: To highlight
the advantages of the proposed approach in terms of effi-
ciency, a comparison is performed with the classical slow-
down behavior. We did the test with 28 real testers. They did
the experiment 2 times. In the first, CAVs without signal light
keep its maximal speed under safety conditions as described
in Fig. 3. In the second, CAVs with a colored signal light use
DRL controller. In all scenarios, pedestrians start crossing
when they feel safe.
In the spontaneous scenario, we observe two ranges of
pedestrians. The first range (35.71%) forces the CAV to stop
using hesitating movements, wait for the complete stop of
the CAV and then slowly cross the street. The second range
of pedestrian (64.28%) is using the rolling gap strategy [8].
Testers are waiting for a sufficient gap in the traffic flow to
cross.
Averagely, the pedestrian crossing delays the CAV by
Fig. 6. A real person participates in the game by using an immersive hamlet technology.
Fig. 7. CAVs speed profiles with 2 lanes and 4 CAVs: The first CAVs in all lanes accelerate to vvand the following CAVs controlled by DRL slow
down at the same time. After the first CAVs pass through CZ, pedestrians start crossing. When the pedestrians exit the lane, the state of the CAV in this
lane is close to S∗. In the results, the average pedestrian waiting time is 3.6s(ST D : 0.45). The average CAV positions are 27.1 (STD : 1.59) and −27.9
(ST D : 1.90) after 5sof pedestrian exiting their lane.
Fig. 8. A comparison between three scenarios: DRL method, QP solution, and spontaneous way. The arrows represent CAV speed.
6.21s±3.03sin the spontaneous scenario, whereas in the
DRL scenario, the delay equals 3.85s±2.20s. The DRL
agent allows the CAV gaining averagely 2.36s. The crossing
of CAV delays pedestrian by 5.25s±1.03sin the spontaneous
scenario, whereas in the DRL scenario, the delay equals
3.53s±1.60s. The DRL agent allows the pedestrian gaining
averagely 1.72s. The big value of the standard deviation
is because there are 6 crossings among the 28 with a
low gain. More precisely, there are 4 crossings where the
CAV was initially far from CZ and 2 crossings where the
pedestrian decision-making was only based on the displayed
green without waiting for the CAV beginning the slowdown
process.
Fig. 8 shows us the process of intersection under three
different methods, spontaneous way, DRL, and QP. The pro-
cess explains clearly why DRL has advantages for this issue.
The spontaneous way must stop in front of the pedestrian,
the DRL slowing down earlier to get a better speed at the
end compared with the QP solution.
V. C ONCLUSION
Improving the CAV-pedestrian single conflict is still an
open topic. Many contributions have been made in terms
of signaling systems to show the vehicle’s intention to the
pedestrian and to alert the other road users to assure a
protected crossing. In this paper, we consider the industrial
environment, where efficiency is also an objective to reach.
To this end, the speed profile of the CAV is subject to an
optimization process to minimize the time lost by the CAV
and the pedestrian, while keeping a safe distance. To consider
the randomness of the pedestrian parameters, DRL is used
and compared to the spontaneous behavior of CAV and even
with an optimized speed profile based on QP. Experiments
are very conclusive and are summarized in Fig. 8. The results
demonstrate the relevance of DRL method to addressing the
coordination between CAV and pedestrians.
The present work focuses on the industrial environment
where pedestrians are trained, and the crossing perimeters are
well delimited. In this context, the obtained results show that
this work deserves to be extended. In terms of optimization,
it is interesting to extend the approach to several lanes (more
than two lanes) and several crossing zones. From the human-
machine interaction, connected wearable devices, such as
vibrating safety jackets, and ground signalization are worth to
be studied for improving both safety and efficiency. Finally,
in a more general context, as in urban areas, the results invite
us to study the speed profile as a vector of communication
with the other road users, in addition to the other studied
signaling systems.
REFERENCES
[1] D. J. Fagnant and K. Kockelman, “Preparing a nation for autonomous
vehicles: opportunities, barriers and policy recommendations,” Trans-
portation Research Part A: Policy and Practice, vol. 77, pp. 167–181,
2015.
[2] V. Astarita and V. P. Giofr´
e, “From traffic conflict simulation to traffic
crash simulation: Introducing traffic safety indicators based on the
explicit simulation of potential driver errors,” Simulation Modelling
Practice and Theory, vol. 94, pp. 215–236, 2019.
[3] A. Rasouli, I. Kotseruba, and J. K. Tsotsos, “Agreeing to cross: How
drivers and pedestrians communicate,” CoRR, vol. abs/1702.03555,
2017. [Online]. Available: http://arxiv.org/abs/1702.03555
[4] T. Lagstr¨
om and V. Malmsten Lundgren, “Avip-autonomous vehicles’
interaction with pedestrians-an investigation of pedestrian-driver com-
munication and development of a vehicle external interface,” Master’s
thesis, 2016.
[5] A. Rasouli and J. K. Tsotsos, “Autonomous vehicles that interact with
pedestrians: A survey of theory and practice,” IEEE transactions on
intelligent transportation systems, vol. 21, no. 3, pp. 900–918, 2019.
[6] R. Tian, E. Y. Du, K. Yang, P. Jiang, F. Jiang, Y. Chen, R. Sherony, and
H. Takahashi, “Pilot study on pedestrian step frequency in naturalistic
driving environment,” in 2013 IEEE Intelligent Vehicles Symposium
(IV). IEEE, 2013, pp. 1215–1220.
[7] R. Tian, L. Li, K. Yang, S. Chien, Y. Chen, and R. Sherony,
“Estimation of the vehicle-pedestrian encounter/conflict risk on the
road based on tasi 110-car naturalistic driving data collection,” in 2014
IEEE Intelligent Vehicles Symposium Proceedings. IEEE, 2014, pp.
623–629.
[8] N. M. Zafri, R. Sultana, M. R. H. Himal, and T. Tabassum, “Factors
influencing pedestrians’ decision to cross the road by risky rolling
gap crossing strategy at intersections in dhaka, bangladesh,” Accident
Analysis Prevention, vol. 142, p. 105564, 2020. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0001457519318615
[9] N. Pennycooke, “Aevita: Designing biomimetic vehicle-to-pedestrian
communication protocols for autonomously operating & parking on-
road electric vehicles,” Ph.D. dissertation, Massachusetts Institute of
Technology, 2012.
[10] B. F ¨
arber, “Communication and communication problems between
autonomous vehicles and human drivers,” in Autonomous driving.
Springer, 2016, pp. 125–144.
[11] M. Electric, “Mitsubishi electric introduces road-
illuminating directional indicators.” 2015. [Online]. Available:
https://www.mitsubishielectric.com/news/2015/1023.html
[12] M. Clamann, M. Aubert, and M. L. Cummings, “Evaluation of vehicle-
to-pedestrian communication displays for autonomous vehicles,” Tech.
Rep., 2017.
[13] Umbrellium, “Starling crossing is an interactive pedestrian crossing
that responds dynamically in real-time to make pedestrians, cyclists
& drivers safer and more aware of each other.” 2020. [Online].
Available: https://umbrellium.co.uk/projects/starling-crossing/
[14] M. Zhang, A. Abbas-Turki, A. Lombard, and A. Koukam, “Connected
and autonomous vehicles cooperate with the pedestrian in industrial
sites based on trajectory optimization and vehicle signalization sys-
tem,” in 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, pp.
188–194.
[15] M. Zhang, A. Abbas-Turki, A. Lombard, A. Koukam, and K.-H.
Jo, “Autonomous vehicle with communicative driving for pedestrian
crossing: Trajectory optimization,” in 2020 IEEE 23rd International
Conference on Intelligent Transportation Systems (ITSC). IEEE,
2020, pp. 1–6.
[16] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
D. Silver, and D. Wierstra, “Continuous control with deep reinforce-
ment learning,” arXiv preprint arXiv:1509.02971, 2015.
[17] Y. Ye, X. Zhang, and J. Sun, “Automated vehicle’s behavior decision
making using deep reinforcement learning and high-fidelity simulation
environment,” Transportation Research Part C: Emerging Technolo-
gies, vol. 107, pp. 155–170, 2019.