Conference PaperPDF Available

Optimal Safety Planning and Driving Decision-Making for Multiple Autonomous Vehicles: A Learning Based Approach

Authors:
2021 International Conference on Emerging Technology in Computing, Communication and Electronics
(ETCCE)
Optimal Safety Planning and Driving
Decision-Making for Multiple Autonomous
Vehicles: A Learning Based Approach
Abu Jafar Md Muzahid
Faculty of Computing
Universiti Malaysia Pahang
26600, Pekan, Pahang, Malaysia
mrumi98@gmail.com
Md. Abdur Rahim
Department of Mechanical Engineering
Universiti Malaysia Pahang
26300, Gambang, Pahang, Malaysia
mdabdurrahim.me2k7@gmail.com
Saydul Akbar Murad
Faculty of Computing
Universiti Malaysia Pahang
26600, Pekan, Pahang, Malaysia
saydulakbarmurad@gmail.com
Syafiq Fauzi Kamarulzaman
Faculty of Computing
Fellow of Automotive Engineering Center
Universiti Malaysia Pahang
26600, Pahang, Malaysia
syafiq29@ump.edu.my
Md Arafatur Rahman
School of Mathematics and Computer Science
Senior Lecturer
University of Wolverhampton, UK
Arafatur.rahman@wlv.ac.uk
Arafatur.rahman@ieee.org
Abstract—In the early diffusion stage of autonomous vehicle
systems, the controlling of vehicles through exacting decision-
making to reduce the number of collisions is a major problem.
This paper offers a DRL-based safety planning decision-making
scheme in an emergency that leads to both the first and multiple
collisions. Firstly, the lane-changing process and braking method
are thoroughly analyzed, taking into account the critical aspects
of developing an autonomous driving safety scheme. Secondly,
we propose a DRL strategy that specifies the optimum driving
techniques. We use a multiple-goal reward system to balance
the accomplishment rewards from cooperative and competitive
approaches, accident severity, and passenger comfort. Thirdly,
the deep deterministic policy gradient (DDPG), a basic actor-
critic (AC) technique, is used to mitigate the numerous collision
problems. This approach can improve the efficacy of the optimal
strategy while remaining stable for ongoing control mechanisms.
In an emergency, the agent car can adapt optimum driving
behaviors to enhance driving safety when adequately trained
strategies. Extensive simulations show our concept’s effectiveness
and worth in learning efficiency, decision accuracy, and safety.
Keywords— Autonomous driving, Multiple vehicle collision,
Robotics, Reinforcement Learning.
I. INT ROD UC TI ON
The vast AI technology has enhanced traffic efficiency and safety
while also opening the road for autonomous vehicles. Algorithms
capable of handling complicated scenarios are required to build the
next generation of driver assistance systems or autonomous driving
systems. Many researchers have offered ways on perception, threat
assessment [1], decision making, and vehicle control. However, one
of the key impediments in autonomous driving is the decision-
making process in critical situations. The decision-making process in
critical scenarios is, nevertheless, one of the significant hurdles to au-
tonomous driving. The issue to evaluate driving behavior is that most
solutions are restricted to avoiding a single-vehicle collision without
reliable trajectory forecasts of other participants. This research will
focus on developing a safety planning decision-making scheme for
autonomous automobiles in multi-vehicle crash scenarios to solve this
difficulty. Numerous groups have looked into the problems of making
solid strategic decisions on autonomous vehicles in a congested and
dynamic urban setting. In Figure: 1 is the evaluations of multiple
vehicle collision and the avoidance time interventions of safety
planing. It creates an optimal safety strategy based on reinforcement
learning to protect the impending first and chain collisions and reduce
the severity of multiple crashes [2]. The problem of multi-vehicle
collision resolution during unexpected deceleration and lane change
is described as a multi-objective optimization problem (MOP) [3],
with acceleration as the single decision variable. Our research intends
to design a cooperation planning scheme for collision prevention
to produce sequences of maneuvered decisions in real-time [4].
Reinforcement learning is the strategy used for assessing actions
made in any given state by learning an approximation value function
and is employed to form an overall decision-making process in our
system. Combinations between the position and the orientation of
both vehicles are considered system conditions, whereas combina-
tions between the movements of both vehicles are characterized
as actions. Because the pair of state-action are multidimensional
978-1-6654-8364-3/21/$31.00 ©2021 IEEE
2021 Emerging Technology in Computing, Communication and Electronics (ETCCE) | 978-1-6654-8364-3/21/$31.00 ©2021 IEEE | DOI: 10.1109/ETCCE54784.2021.9689820
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
Time to collision
Full brakingPartial brakingCollision warning
CW PB1PB2FB
Figure 1: Evaluation of Multiple Collision Caused By Sudden
Lane Change Where The Ego Vehicle Receive CW (Collision
Warning) From The Leading Vehicle and Immediately Deter-
mine The P B1(Partial Break Time-1) and P B2(Partial Break
Time-2) Then Finally The FB (Full Break Time).
and continuous, reinforcement learning aids the difficult task of
determining the value function of this multidimensional problem.
Our scheme is innovative in that the collision-avoidance safety
planning challenge is articulated as a sequential decision method
in a continuous multidimensional structure [5] and addressed via
reinforcement learning in the interior a continuous action space. The
presenting of two solutions to collision avoidance multi-objective
optimization problems is the work’s key contribution. First, the deep
deterministic policy gradient (DDPG) algorithm of reinforcement
learning with continuous actions is applied in the cooperative [6] and
competitive aspects to maximize the overall driving benefits. Because
of the predicted gradient of its action-value function, DDPG can be
estimated significantly more reliably than a typical stochastic policy
gradient. The suggested decision-making approach and calculation
algorithms were evaluated in numerous typical scenarios using a
simulation and verification platform based on Unity3D Game Engine
with ML agents and Python API. The following are the key concerns
of this paper:
1) In order to avoid multi-vehicle crashes in emergency and severe
conditions, a general decision-support safety scheme for mul-
tiple autonomous vehicle driving is presented, which combines
two alternative driving techniques and retains the efficiency of
the driving strategy.
2) To keep the steering and acceleration of the ego vehicle bound,
a novel driving strategy is developed.
3) Using Unity3D Game Engine, the new strategy will be created
and tested. The associated performance outcomes are assessed.
The following is how the rest of the paper is organised: The prior
work in this domain is represented in Section II. The approach
employed in this study is discussed in Section III, which includes
an overview of the deep deterministic policy gradient algorithm for
reinforcement learning and the simulation setup and model training
details. In Section IV the introduces the simulation verification
platform for evaluating the proposed method’s efficacy and reliability.
Finally the future works and Conclusions are provided in Section V.
II. RE LATE D WOR KS
Collision avoidance for multiple vehicles is a hotly debated topic
in academia [7]. The majority of the early research focused on
two-dimensional safety path planning in the context of a group
of autonomous vehicles attempting to avoid stationary objects. Re-
searchers have recently focused on the necessity of automobile
collision avoidance. Some ways [8] consider other vehicles to be
movable impediments for each vehicle. By projecting their measured
velocities, one can estimate where other vehicles will be in the future
and prevent collisions accordingly [8] has presented a collide-free
approach to navigate a collection of independent unmanned vehicles.
The individual positioning and orientation information is translated
into a navigation variable to provide the navigational function. But
the vehicle’s continuous speed or turning radius cannot be restricted
by that lone way. Since the safe operation of the current time stage
may lead to future collisions, vehicles could have to alter direction
immediately, which in many realistic circumstances is not possible
due to these vehicles’ movie limits. Other investigations [9] use
parametric curves to shape the way in which every movable object
of the environment may take smooth distances and ultimately reach
its target. For vehicles to trace these routes, however, their speed and
direction must constantly vary, and the size of the change is huge
and not practical. We, therefore, believe that cars are traveling at
a constant speed and gradually shifting their orientation by means
of circular bows in order to make it easy to execute the proposed
algorithm in real-time with massive scenarios for safe path planning.
In previous related research, each vehicle calculates its near-
optimal path and plans its motion solitary by following a collection of
local rules [10]. In an earlier related study, every vehicle determines
its almost optimum trajectory and plans its movement by obeying
an assortment of local regulations alone [11]. In [12], a localized
soccer robots route planning technique is provided; the turning radius
limits and the robot’s speed constraints are explored. In many cir-
cumstances, however, a localized track planning scheme [13] cannot
manage arbitrary traffic because of the cinematic unpredictability
of the individuals concerned. Heuristic techniques like the genetic
algorithm are utilized in a certain study in order to reach resolutions
to this challenge by including all vehicles in the scheme [14].
For instance, research conducted in [15] proposes an optimal crash
avoidance strategy between multiple robots, enabling them to avoid
prospective collisions without any new collision. But it is difficult for
a vehicle to alter its route if it has several potential crashes since the
system needs tremendous computational work [16], [17].
In any dynamic and continuous autonomous driving situation,
when confronted with the challenge of safety planning, artificial
potential field(APF) techniques [18] are equated with continuous,
static likely pitch equivalent equations. In real-time, APF systems
for control and navigation can be quickly deployed and executed.
These algorithms attempt to achieve an objective by employing virtual
forces to avoid impediments on the trajectory, which attract or pull it
away. One advantage of this approach is that it may take account of
various limits solely by adding particular forces. In order to utilize
APFs in distinct autonomous driving scenarios, the different new
potential was proposed depending on road structure or vehicle physics
[19], and intersection. Various enhancements have been suggested to
address additional constraints of the classic autonomous driving APF
algorithm. In order to prevent local minimum challenge, the modified
APF model can also have a virtual obstacle or a location addressed
of target point. The close obstacle problem can be addressed by
changing the computation of APF utilizing fuzzy logic. A further
artificial friction force to reduce oscillations was introduced in [20],
[21]. While APF algorithms may be sufficient for the outcome,
the vehicle’s final design is unpredictable, resulting in hazardous
scenarios.
Another essential drawback with this approach is that it is difficult
to consider the vehicle’s kinematic restrictions [22]. This approach
cannot be ensured the mechanical feasibility of paths. The methods
of elastic bands (EB) are likewise derived from physical similarities.
The anticipated path of the goal is represented by a succession of
springs, which can be distorted in response to environmental changes.
The EB’s intrinsic forces restrict neighbouring path nodes, although
the approach struggles with exact kinematic constraints. Choosing
the final point for the trajectory is likewise a challenging emergency
question. In the autonomous driving manner, vehicle control [23]
is responsible for following the theoretic trajectory predicted using
the prospective algorithm [24], [25]. Vehicle kinematics models,
such as the bicycle model, have been employed in a series, mostly
steering angle and accelerated, of commandments for translating this
trajectory (x(t), y(t)). Several control methods were utilised for com-
paratively low-speed driving environment, including: PID controls,
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
pure persecution controllers, and Stanley controllers. At high speeds
or with a significant curvature change rate, the dynamic model-
based control approaches function better. The nonlinear control and
adoption of the Model Predictive Control, as well as the feedback
feedforward controlling, can boost vehicle stability at high speeds.
These methods, however, presuppose that the traffic environment is
fully known, including the intentions of other road users. In view
of the environmental unpredictability, the safety decision-making
job is typically modelled on the Markov Decision Process that
sometimes was partially observable and applied to numerous driving
scenarios. The primary theme of the safety planning method is the
planning of driving manoeuvres, i.e. the development of optimum
driving behaviours for a particular scenario, based on the tracks of
participant vehicles. The rapid growth of machine learning allowed
a mix of classical methods and ways of improved learning to make
autonomous decisions in very interactive environments such as the
learning process or, more recently, Reinforcement Learning.
Lane Keeping
Lane Change
Environment
Figure 2: Design of Training Environment for SPS (Safety
Planing Scheme) Mechanism of Avoid Multiple Vehicle Col-
lision.
III. METHODOLOGY
This study is based on the concept of a cooperative and competitive
strategy in multiple autonomous agent vehicles. In order to build a
conceptual framework involving perception, communication and co-
operation, threat assessment, decision making, and finally, the vehicle
control modules, certain components are required as part of our core
project. A multi-constrained issue was resolved by an optimal safety
planning application to mitigate multiple vehicle collisions, including
risk prediction. As far as general architecture is concerned, we are
developing an environment scenario in which the ego vehicle takes
safety decisions based on upcoming obstacles, lane maintenance and
lane-change decisions, and the overall demonstration in Figure: 2.
A. Deep Deterministic Policy Gradient
The learning method is like human learning, depicted as a Markov
decision-making process (S, A, P, R). DeepMind proposed the DQN
algorithm in 2013, which opened a new era in deep reinforcement
learning. The key enhancement of the algorithm is the utilization
of expert replay and the construction of a second target network to
erase the link among the training samples and increase the training
stability [26]. Certain DQN developed algorithms have significantly
advanced in the discrete action space problem. However, the issue
of continuous strategic control is quite challenging to understand.
DeepMind proposed the DDPG method based on the DPG and DQN
algorithms in 2015, and the standardization process was imported
into the deep learning environment. Experiments have shown that the
approach provided works effectively on numerous types of continuous
control issues. A new actor-critics technique is the DDPG algorithm.
Parameter
Distribution
Communication
[VANET] Collision Threat
Prediction
DDPG
Agent
Figure 3: Demonstration of SPS (Safety Planing Scheme)
Agent.
The actor function π(s|µ)creates an action given current status
in an actor-critical algorithm. The critic criticizes an action-value
Q(s, a|A)function on the basis of the output of the actor and the
current state. The TD errors created by the critic drive learning in
the critical network, and then the actor’s network is upgraded on the
basis of the policy gradient. The DDPG algorithm merges the benefits
of the actor-critical and DQN algorithms to facilitate convergence.
DDPG introduces certain DQN ideas, which use the target network
[27]. According to this cognitive manner we build a training agent
and the we illustrate it in Figure: 2.
B. Experimental Setup
Extensive experiments are conducted to quantify the two main
autonomous driving metrics, namely the total rewards, which show
the overall success of our scheme and the number of collisions
across both cooperative and competitive approaches to the system.
We use the Unity3d game engine to create the environment depicted
in Figure: 2 to explain the entire driving system. Since the road is
approximately 886 m and 15 m as width and length is intended.
We progressively introduced vehicles to the path and observed the
performance in relation to their learning behavior. For example,
two conventional vehicles (CVs) operate on the highway in these
simulations. We then change the number of DDPG-equipped au-
tonomous driving vehicles (AVs) for testing the proposed driving
scheme. Defining reward function, we follow the Figure: 4 and the
equation deployed by N.Sugiyama et al. [28] ino of vehicles motion
and velocity model:
d2zi
dt2=a{V(∆zi)dzi
dt }
here the optimal-velocity function presented by V(∆zi), zi(t)indi-
cates the position of ith vehicle at time t, and ais the sensitivity
(the inverse of the suspension of vehicle iat time t,zi(t)(=
zi+1(t)zi(t)) is the headway of time). This is achieved by
comparing different simulations performances. In order to achieve
cooperative and competitive approaches, the reward values alter, and
each vehicle’s R communication range is 80 [0; 80] meters. The
vehicles have a speed range of 80 km=h and 120 km=h, respectively.
The values of the other parameters are defined as steering [-45,45],
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
acceleration [0,1], brake [0,1], angle [-90,90]. In TensorFlow, we have
built the driving arrangement by using two hidden-layer networks
with neurons as a non-linear function in order to get the optimum
policy. In each layer, there are about 300 and 400 neurons. The
learning rate is 1e6, and the batch size is 32. In order to compare the
performance of our proposed approach, we apply the decentralized
collision avoidance policy for multi-robot systems called POMDP
suggested by Pinxin et al. [29]. POMDP employs a multi-agent
deep reinforcement learning architecture to enable several robots
to develop an ideal collision avoidance technique. An experimental
episode indicates a stage on the real-world circuit and hence a whole
race from the beginning to the end. However, the race finishes
abruptly in other cases, like when one agent turns back or leaves the
track edge owing to the accident. As a leading agent, a vehicle sends
its learned model parameters to its following agents within one local
network while the following agents sit inactively and are awaiting
the learned by their leading agent. As we developed an autonomous
non-deterministic driving environment, we performed 10 experiments
and calculated the average results for all runs.
Vehicle 1 BlockageVehicle 2Vehicle 3Vehicle 4
x = 0x
Figure 4: Presentation of Rewards Function Regarding The
Conditions of Multiple Vehicle Collision by Sudden Slow-
down.
IV. RES ULT & DISCUSSION
The agents learn the optimal driving comportment during simula-
tions for mastering the avoidance of collision by using the proposed
driving scheme in Unity3D. The objective of the agent is to improve
its conduct dynamically by learning to prevent collision with other
agents and things near them. In the meantime, their time of arrival
is also minimised. An autonomous vehicles system must scale
efficiently as there are fluctuations in the number of participants.
We thus test the system’s scalability in the different densities of
participating AVs. To assess driving performance, we employed two
Conventional Vehicle (CVs), and one Autonomous Vehicles (AVs)
installed with DDPG. The AVs agents performance is evaluated by
considering mainly the number of collisions suffered by the agents
and the rewards achieved during tests presented in Figure: 5 and
Figure: 6. The first cooperative and the second one are competitive
in results, which indicate that the driving scheme includes two CVs
with one and two AVs working with DDPG. In Figure: 6, the average
number of collisions over a span of 500 episodes during the training
process is presented. Average collisions during the training phase are
given in Figure: 6 for the episodes. In this figure, every point of
collision is measured by adding up to every 50 episodes. From the
data, we note that the average number of collisions in all scenes
is decreased as the number of episodes rise. The average number
of competitive collisions is approximately 211% higher compared
to the cooperative strategy. The reason is that in the system, the
distribution parameter approach of the DDPG is used to increase AVs
performance. The DDPG rewards for nearby autonomous vehicles
in a communication network compare us to each vehicle’s optimal
driving policy. The highest reward is the best conduct of autonomous
vehicles. Furthermore, we see that DDPG’s competitive technique
requires more training time to reach zero collision in Figure: 6. The
highest reward represents the best driving behaviours of autonomous
vehicles. Additionally, we also notice that a longer training time
is needed by the competitive approach of DDPG to achieve zero
collision in Figure: 6. Compared to DDPG’s cooperative approach,
the zero collision objective in the training phases is 13%faster.
Figure 5: For Each Approach, The Average Episode Rewards
Against Time Step.
Figure 6: For Each Approach, The Number of Collisions Per
Episode.
The cooperative rewards gain by autonomous driving vehicles are
measured, and the results are shown in Figure: 5. The results show
the sum of the reward recorded for the entire time steps of training.
The average reward increases for the safety scheme as the number of
episodes grows. At first, the reward is low in all situations; especially
since the cooperative approach of each autonomous vehicles has
been initiated with random learning parameters in the initial stage,
the competitive approach has been very low in a few periods. The
vehicles can therefore not choose the right action for their following
move, leading to chain crashes. The cooperative strategy penalizes
unsuccessful acts by reducing rewards, allowing DDPG to learn from
its mistakes. Then, depending on its previous experiences, it may
choose the correct action in the future episode. As the agent’s learning
experiences improve with each episode, the rewards begin to rise. The
reason is because the cars learnt to take an appropriate action to pre-
vent collisions. In this situation, rewards are given to encourage and
limit the desired driving behaviour. We note that DDPG’s competitive
technique has the lowest reward compared to this cooperative strategy
over a period of 500 episodes. Approximately 33% greater than the
average competitive DDPG Driving Vehicle reward for cooperation
is paid by autonomous driving vehicles. In addition, when a collision
is possible, the DDPG algorithm additionally awards penalties when
the agents move too close together. Interestingly enough, the agents
are not colliding with other vehicles because they acquired better
policies.
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
V. CONCLUSIONS
We analysed in depth the needs and the design objectives of
multiple autonomous vehicles in this paper. In order to achieve
such objectives, we have presented an effective and safety planning
scheme. We have devised an efficient collision prevention technique
using a multi-actor hierarchy. The DDPG’s rewards are defined by
taking into account collision avoidance in the field of cooperative
and competitive approaches, minimising time of arrival, and road
maintenance. Its worth indicates the superiority of driving action;
for example, a superior reward indicates optimum safety driving
behaviour. To boost up the speed of learning process, the parameter
distribution approach is used. Using a privately-held communica-
tion technique with a follower-leader multi-agent manner. We have
demonstrated that the safety scheme efficiently reduces the count
of collisions and scales with an increasing number of autonomous
vehicles through rigorous testing at Unity3d. We also exposed metic-
ulous safety learning, in which vehicles learnt not to collide with
other vehicles but to go off the lane and stop. This enhanced driving
behaviour while lowering danger and liability.As part of our future
work we intend to improve safety by the incorporation of camera-
based images or videos of the environment into the learning process.
In addition, to adapt it to the situations where roads have no necessity
for speed range, we contemplate enhancing the safety scheme.
VI. ACKNOWLEDGMENT
The authors would like to thank Universiti Malaysia Pa-
hang for laboratory facilities as well as additional financial sup-
port under Internal Research grant RDU192202, further thanks
to the Ministry of Higher Education of Malaysia for provid-
ing financial support under Fundamental Research Grant Scheme
(FRGS) No.FRGS/1/2018/TK08/UMP/02/2 (University reference
RDU190137).
REFERENCES
[1] T. B. Sarwar, N. M. Noor, M. S. U. Miah, M. Rashid, F. Al Farid,
and M. N. Husen, “Recommending research articles: A multi-level
chronological learning-based approach using unsupervised keyphrase
extraction and lexical similarity calculation,IEEE Access, 2021.
[2] S. F. Kamarulzaman and S. Yasunobu, “Cooperative multi-knowledge
learning control system for obstacle consideration,” in International
Conference on Information Processing and Management of Uncertainty
in Knowledge-Based Systems. Springer, 2014, pp. 506–515.
[3] A. Karim, M. A. Islam, P. Mishra, A. J. M. Muzahid, A. Yousuf,
M. M. R. Khan, and C. K. M. Faizal, “Yeast and bacteria co-culture-
based lipid production through bioremediation of palm oil mill effluent:
a statistical optimization,” Biomass Conversion and Biorefinery, pp. 1–
12, 2021.
[4] S. A. Murad, Z. R. M. Azmi, Z. H. Hakami, N. J. Prottasha, and
M. Kowsher, “Computer-aided system for extending the performance
of diabetes analysis and prediction,” in 2021 International Conference
on Software Engineering & Computer Systems and 4th International
Conference on Computational Science and Information Management
(ICSECS-ICOCSIM). IEEE, 2021, pp. 465–470.
[5] M. Kowsher, A. Tahabilder, and S. A. Murad, “Impact-learning: a robust
machine learning algorithm,” in Proceedings of the 8th international
conference on computer and communications management, 2020, pp.
9–13.
[6] M. A. Rahim, M. A. Rahman, M. M. Rahman, A. T. Asyhari, M. Z. A.
Bhuiyan, and D. Ramasamy, “Evolution of iot-enabled connectivity and
applications in automotive industry: A review,” Vehicular Communica-
tions, vol. 27, p. 100285, 2021.
[7] S. F. Kamarulzaman and H. Al Sibai, “Compound learning control
for formation management of multiple autonomous agents.” Pertanika
Journal of Science & Technology, vol. 25, 2017.
[8] A. Widyotriatmo and K.-S. Hong, “Navigation function-based control of
multiple wheeled vehicles,” IEEE Transactions on Industrial Electronics,
vol. 58, no. 5, pp. 1896–1906, 2011.
[9] M. H. Alsibai, S. F. Kamarulzaman, H. A. Alfarra, and Y. H. Naif, “Real
time emergency auto parking system in driver lethargic state for accident
preventing,” in MATEC Web of Conferences, vol. 90. EDP Sciences,
2017, p. 01034.
[10] M. A. Rahim, M. A. Rahman, M. M. Rahman, A. T. Asyhari, M. Z. A.
Bhuiyan, and D. Ramasamy, “Evolution of iot-enabled connectivity and
applications in automotive industry: A review,” Vehicular Communica-
tions, vol. 27, p. 100285, 2021.
[11] Y. S. Chi and S. F. Kamarulzaman, “Intelligent gender recognition
system for classification of gender in malaysian demographic,” in
InECCE2019: Proceedings of the 5th International Conference on Elec-
trical, Control & Computer Engineering, Kuantan, Pahang, Malaysia,
29th July 2019, vol. 632. Springer Nature, 2020, p. 283.
[12] K. Jolly, R. S. Kumar, and R. Vijayakumar, “A bezier curve based path
planning in a multi-agent robot soccer system without violating the
acceleration limits,” Robotics and Autonomous Systems, vol. 57, no. 1,
pp. 23–33, 2009.
[13] W. S. Cheong, S. F. Kamarulzaman, and M. A. Rahman, “Implementa-
tion of robot operating system in smart garbage bin robot with obstacle
avoidance system,” in 2020 Emerging Technology in Computing, Com-
munication and Electronics (ETCCE). IEEE, 2020, pp. 1–6.
[14] K. P. Cheng, R. E. Mohan, N. H. K. Nhan, and A. V. Le, “Multi-
objective genetic algorithm-based autonomous path planning for hinged-
tetro reconfigurable tiling robot,” IEEE Access, vol. 8, pp. 121267–
121 284, 2020.
[15] B. Li, Y. Ouyang, Y. Zhang, T. Acarman, Q. Kong, and Z. Shao, “Opti-
mal cooperative maneuver planning for multiple nonholonomic robots in
a tiny environment via adaptive-scaling constrained optimization,IEEE
Robotics and Automation Letters, vol. 6, no. 2, pp. 1511–1518, 2021.
[16] L. C. Kiew, A. J. M. Muzahid, and S. F. Kamarulzaman, “Vehicle route
tracking system based on vehicle registration number recognition using
template matching algorithm,” in 2021 International Conference on
Software Engineering Computer Systems and 4th International Confer-
ence on Computational Science and Information Management (ICSECS-
ICOCSIM), 2021, pp. 249–254.
[17] A. J. M. Muzahid, S. F. Kamarulzaman, and M. A. Rahman, “Compar-
ison of ppo and sac algorithms towards decision making strategies for
collision avoidance among multiple autonomous vehicles,” in 2021 In-
ternational Conference on Software Engineering Computer Systems and
4th International Conference on Computational Science and Information
Management (ICSECS-ICOCSIM), 2021, pp. 200–205.
[18] L. Shangguan, J. A. Thomasson, and S. Gopalswamy, “Motion planning
for autonomous grain carts,” IEEE Transactions on Vehicular Technol-
ogy, vol. 70, no. 3, pp. 2112–2123, 2021.
[19] M. A. Rahim, M. Rahman, M. A. Rahman, A. J. M. Muzahid, and S. F.
Kamarulzaman, “A framework of iot-enabled vehicular noise intensity
monitoring system for smart city,Advances in Robotics, Automation
and Data Analytics: Selected Papers from ICITES 2020, vol. 1350, p.
194, 2021.
[20] M. Fu, H. G. Franquelim, S. Kretschmer, and P. Schwille, “Non-
equilibrium large-scale membrane transformations driven by minde
biochemical reaction cycles,” Angewandte Chemie International Edition,
vol. 60, no. 12, pp. 6496–6502, 2021.
[21] M. S. I. Shofiqul, N. Ab Ghani, and M. M. Ahmed, “A review on
recent advances in deep learning for sentiment analysis: Performances,
challenges and limitations,” 2020.
[22] S. F. Kamarulzaman and M. H. Alsibai, “Time-change-fuzzy-based
intelligent vehicle control system for safe emergency lane transition
during driver lethargic state,Advanced Science Letters, vol. 24, no. 10,
pp. 7554–7558, 2018.
[23] J. Odili, M. N. M. Kahar, A. Noraziah, and S. F. Kamarulzaman, “A
comparative evaluation of swarm intelligence techniques for solving
combinatorial optimization problems,” International Journal of Ad-
vanced Robotic Systems, vol. 14, no. 3, p. 1729881417705969, 2017.
[24] A. J. M. Muzahid, S. F. Kamarulzaman, and M. A. Rahim, “Learning-
based conceptual framework for threat assessment of multiple vehicle
collision in autonomous driving,” in 2020 Emerging Technology in
Computing, Communication and Electronics (ETCCE). IEEE, 2020,
pp. 1–6.
[25] M. Rashid, M. Islam, N. Sulaiman, B. S. Bari, R. K. Saha, and M. J.
Hasan, “Electrocorticography based motor imagery movements classi-
fication using long short-term memory (lstm) based on deep learning
approach,” SN Applied Sciences, vol. 2, no. 2, pp. 1–7, 2020.
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
[26] M. Miah, J. Sulaiman, T. B. Sarwar, K. Z. Zamli, and R. Jose, “Study of
keyword extraction techniques for electric double-layer capacitor domain
using text similarity indexes: An experimental analysis,Complexity, vol.
2021, 2021.
[27] M. M. Hasan, M. S. Islam, and S. Abdullah, “Robust pose-based
human fall detection using recurrent neural network,” in 2019 IEEE
International Conference on Robotics, Automation, Artificial-intelligence
and Internet-of-Things (RAAICON). IEEE, 2019, pp. 48–51.
[28] N. Sugiyama and T. Nagatani, “Multiple-vehicle collision induced by
a sudden stop in traffic flow,” Physics Letters A, vol. 376, no. 22, pp.
1803–1806, 2012.
[29] P. Long, T. Fan, X. Liao, W. Liu, H. Zhang, and J. Pan, “Towards
optimally decentralized multi-robot collision avoidance via deep rein-
forcement learning,” in 2018 IEEE International Conference on Robotics
and Automation (ICRA). IEEE, 2018, pp. 6252–6259.
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
... This model has an advantage over the Linear Sequential Model (LSM) in that it inherits the LSM's quality while excluding its rigidity [14]. Furthermore, the Mechanism Iteration model's feedback process will be available in the incremental model [15]. Another aspect that influenced the decision to choose this approach was the limited time available to create an initial functional version of the system [16]. ...
Article
The rapid advancements in computer technology and the internet’s acceptance in every aspect of our lives, particularly in recent years, have made students and instructors vital in the teaching and learning sector. Web-based studies have also brought about advances in the education area, and numerous applications have become widespread in this field. In this paper, we suggested an online test multiple-choice question assessment system for students called the Online Exam System (OES). This system may be used by any university, college, or institution that has a computerized education system. The OES can be used by teachers to administer quizzes. The system will calculate the participant’s performance based on his response, and the following question will be created based on the participant’s performance. After the examination, the system will display the results and offer feedback based on the participant’s request. Administrative control over the entire system is available. A teacher has authority over the question bank and is responsible for creating test schedules. Therefore, the project will be very helpful for the beginner and mid-level programming learners. And also, will give a proper guideline to the students who are willing to learn programming and introduce the users with competitive programming and problem-solving skills.
... Because they do not consider future the impact of these methods on the system's safety efficiency during the safety efficiency analysis phase. We investigate the active learning function during the training phase for the collision avoidance scheme safety efficiency analysis to see if it can be performed as efficiently as possible Muzahid, Rahim, Murad, Kamarulzaman, and Rahman (2021). We can determine how accurately and efficiently AVs control rationality can assess the precise collision avoidance system safety after reviewing these considerations. ...
Thesis
Full-text available
Self-driving cars have become a popular research topic in recent years. Autonomous driving is a complicated field of study that involves a variety of disciplines, such as electronics, computer vision, geo-location, decision-making, or control. Autonomous vehicles are an example of non-linear technologies being used in the real world. Controlling this kind of device in particular situations in the context of multi-agent traffic systems is difficult because of instability. This type of equipment demands expertise, and it is even more difficult to create this understanding of talent as an independent control system. Because each agent has its own self-determined protocol decision management, it is hard to coordinate several autonomous devices on a single job. Over the last decade, there has been a lot of attention on sequential decision-making under ambiguity and uncertainty, which is a distinct range of challenges requiring an agent to interact with an uncertain environment to achieve a target. Reinforcement learning methods applied to these challenges have resulted in recent AI achievements in robotics, game playing, and other areas. In response to these empirical testimonies, this project confronts the problem of multiple vehicle control decisions and performs control strategies for the avoidance of severe multiple vehicle collisions in autonomous vehicles. These control techniques rely on the reinforcement learning model and deploy two distinct traffic scenarios for progressing research flow. An extensive taxonomy conveyed the existing protocols and solutions, and a conceptual model for MVCCA was formulated first. Then, using the Reinforcement Learning-based Decision-Making (RLDM) model, the system is developed and implemented. An extensive simulation gives us the best outcomes for the development of optimum driving strategies in a multi-agent traffic environment. We extensively evaluate the training performance, driving performance, and the ability of collision avoidance as well. We investigated the training performance of both the single vehicle and multiple vehicle environments. Validation of the decision-making scheme would create new opportunities for autonomous driving, as well as new concepts and applications.
... As such, we used a formal approach to define driving risk for nondeterministic environmental factors. Muzahid et al. [28] made several contributions to the scheme for adapting the reward function. They used a multiobjective reward system to balance achievement rewards from cooperative and competitive approaches, accident severity, and passenger comfort. ...
Article
Full-text available
Autonomous driving systems are crucial complicated cyber–physical systems that combine physical environment awareness with cognitive computing. Deep reinforcement learning is currently commonly used in the decision-making of such systems. However, black-box-based deep reinforcement learning systems do not guarantee system safety and the interpretability of the reward-function settings in the face of complex environments and the influence of uncontrolled uncertainties. Therefore, a formal security reinforcement learning method is proposed. First, we propose an environmental modeling approach based on the influence of nondeterministic environmental factors, which enables the precise quantification of environmental issues. Second, we use the environment model to formalize the reward machine’s structure, which is used to guide the reward-function setting in reinforcement learning. Third, we generate a control barrier function to ensure a safer state behavior policy for reinforcement learning. Finally, we verify the method’s effectiveness in intelligent driving using overtaking and lane-changing scenarios.
... In the article 124 , the authors propose a new way of thinking in which agents learn collision as a single agent and then avoid multiple collisions by reversing the trained policy. Major research using quadratic mixed-integer programming (MIQP) has been conducted 143 , with others implementing B-splines 144 , polynomials 145 , elastic bands 146 , and potential fields 147 , in route planning strategies 148 . Contemporary research takes into account the problem of route planning for a single vehicle when multiple vehicles are present in a traffic environment. ...
Article
Full-text available
Prospective customers are becoming more concerned about safety and comfort as the automobile industry swings toward automated vehicles (AVs). A comprehensive evaluation of recent AVs collision data indicates that modern automated driving systems are prone to rear-end collisions, usually leading to multiple-vehicle collisions. Moreover, most investigations into severe traffic conditions are confined to single-vehicle collisions. This work reviewed diverse techniques of existing literature to provide planning procedures for multiple vehicle cooperation and collision avoidance (MVCCA) strategies in AVs while also considering their performance and social impact viewpoints. Firstly, we investigate and tabulate the existing MVCCA techniques associated with single-vehicle collision avoidance perspectives. Then, current achievements are extensively evaluated, challenges and flows are identified, and remedies are intelligently formed to exploit a taxonomy. This paper also aims to give readers an AI-enabled conceptual framework and a decision-making model with a concrete structure of the training network settings to bridge the gaps between current investigations. These findings are intended to shed insight into the benefits of the greater efficiency of AVs set-up for academics and policymakers. Lastly, the open research issues discussed in this survey will pave the way for the actual implementation of driverless automated traffic systems.
... Driving behaviors, such as merging lanes and turning, have been modeled as a set of symbolic actions [16]- [19]. Those works focused on task-level, discrete-space behavioral planning, and did not consider safety, cost, or both at the motion level. ...
Preprint
Given the current point-to-point navigation capabilities of autonomous vehicles, researchers are looking into complex service requests that require the vehicles to visit multiple points of interest. In this paper, we develop a layered planning framework, called GLAD, for complex service requests in autonomous urban driving. There are three layers for service-level, behavior-level, and motion-level planning. The layered framework is unique in its tight coupling, where the different layers communicate user preferences, safety estimates, and motion costs for system optimization. GLAD is visually grounded by perceptual learning from a dataset of 13.8k instances collected from driving behaviors. GLAD enables autonomous vehicles to efficiently and safely fulfill complex service requests. Experimental results from abstract and full simulation show that our system outperforms a few competitive baselines from the literature.
... A rigorous mathematical framework , in 145 formulates and discusses the optimization algorithm for the solution and examines the main details of the implementation of the multi-vehicle motion planning problem. 126 propose a new way of thinking in which agents learn collision as a single agent and then avoid multiple collisions by reversing the trained policy.Major research using quadratic mixed-integer programming (MIQP) has been conducted 146 , B-splines 147 , polynomials 148 ,elastic bands 149 , and potential fields, 150 in route planning strategies 151 . Contemporary research takes into account the problem of route planning for a single vehicle when multiple vehicles are present in traffic environment. ...
Preprint
Full-text available
Prospective customers are becoming more concerned about safety and comfort as the automobile industry swings toward Automated Vehicles (AVs). A comprehensive evaluation of recent AVs collision data indicates that modern automated driving systems are prone to rear-end collisions, usually leading to multiple vehicle collisions. Moreover, most investigations into severe traffic conditions are confined to single-vehicle collisions. This work reviewed diverse techniques of existing literature to provide planning procedures for Multiple Vehicle Cooperation and Collision Avoidance (MVCCA) strategies in AVs while also considering their performance and social impact viewpoints. Firstly, we investigate and tabulate the existing MVCCA techniques associated with single-vehicle collision avoidance perspectives. Then, current achievements are extensively evaluated, challenges and flows are identified, and remedies are intelligently formed to exploit a taxonomy. This paper also aims to give readers a AI-enable conceptual framework, a decision-making model with a concrete structure of the training network settings to bridge the gaps between current investigations. These findings are intended to shed insight on the benefits of the greater efficiency of AVs set-up for academics and policymakers. Finally, the open research issues discussed in this article will pave the way for the actual implementation of driver-less automated traffic systems.
... In the phase of uncertainty modeling, we proposed an approach to avoiding chain collisions that is considered as an MDP (Markov Decision Process) that might be solved by applying DRL [23]. In the phase of safety efficiency analysis, we conducted an in-depth investigation of existing RL algorithm-based [24] decision-making methods parameter impacts [25]. During the training phase, we investigate the active learning function to efficiently perform the collision avoidance scheme safety efficiency analysis. ...
Article
Full-text available
Vehicle control in autonomous traffic flow is often handled using the best decision-making reinforcement learning methods. However, unexpected critical situations make the collisions more severe and, consequently, the chain collisions. In this work, we first review the leading causes of chain collisions and their subsequent chain events, which might provide an indication of how to prevent and mitigate the crash severity of chain collisions. Then, we consider the problem of chain collision avoidance as a Markov Decision Process problem in order to propose a reinforcement learning-based decision-making strategy and analyse the safety efficiency of existing methods in driving security. To address this, A reward function is being developed to deal with the challenge of multiple vehicle collision avoidance. A perception network structure based on formation and on actor-critic methodologies is employed to enhance the decision-making process. Finally, in the safety efficiency analysis phase, we investigated the safety efficiency performance of the agent vehicle in both single-agent and multi-agent autonomous driving environments. Three state-of-the-art contemporary actor-critic algorithms are used to create an extensive simulation in Unity3D. Moreover, to demonstrate the accuracy of the safety efficiency analysis, multiple training runs of the neural networks in respect of training performance, speed of training, success rate, and stability of rewards with a trade-off between exploitation and exploration during training are presented. Two aspects (single-agent and multi-agent) have assessed the efficiency of algorithms. Every aspect has been analyzed regarding the traffic flows: (1) the controlling efficiency of unexpected traffic situations by the sudden slowdown, (2) abrupt lane change, and (3) smoothly reaching the destination. All the findings of the analysis are intended to shed insight on the benefits of a greater, more reliable autonomous traffic set-up for academics and policymakers, and also to pave the way for the actual carry-out of a driver-less traffic world.
Article
Full-text available
Keywords perform a significant role in selecting various topic-related documents quite easily. Topics or keywords assigned by humans or experts provide accurate information. However, this practice is quite expensive in terms of resources and time management. Hence, it is more satisfying to utilize automated keyword extraction techniques. Nevertheless, before beginning the automated process, it is necessary to check and confirm how similar expert-provided and algorithm-generated keywords are. This paper presents an experimental analysis of similarity scores of keywords generated by different supervised and unsupervised automated keyword extraction algorithms with expert-provided keywords from the electric double layer capacitor (EDLC) domain. The paper also analyses which texts provide better keywords such as positive sentences or all sentences of the document. From the unsupervised algorithms, YAKE, TopicRank, MultipartiteRank, and KPMiner are employed for keyword extraction. From the supervised algorithms, KEA and WINGNUS are employed for keyword extraction. To assess the similarity of the extracted keywords with expert-provided keywords, Jaccard, Cosine, and Cosine with word vector similarity indexes are employed in this study. The experiment shows that the MultipartiteRank keyword extraction technique measured with cosine with word vector similarity index produces the best result with 92% similarity with expert-provided keywords. This study can help the NLP researchers working with the EDLC domain or recommender systems to select more suitable keyword extraction and similarity index calculation techniques.
Article
Full-text available
A research article recommendation approach aims to recommend appropriate research articles to analogous researchers to help them better grasp a new topic in a particular research area. Due to the accessibility of research articles on the web, it is tedious to recommend a relevant article to a researcher who strives to understand a particular article. Most of the existing approaches for recommending research articles are metadata-based, citation-based, bibliographic coupling-based, content-based, and collaborative filtering-based. They require a large amount of data and do not recommend reference articles to the researcher who wants to understand a particular article going through the reference articles of that particular article. Therefore, an approach that can recommend reference articles for a given article is needed. In this paper, a new multi-level chronological learning-based approach is proposed for recommending research articles to understand the topics/concepts of an article in detail. The proposed method utilizes the TeKET keyphrase extraction technique, among other unsupervised techniques, which performs better in extracting keyphrases from the articles. Cosine and Jaccard similarity measures are employed to calculate the similarity between the parent article and its reference articles using the extracted keyphrases. The cosine similarity measure outperforms the Jaccard similarity measure for finding and recommending relevant articles to understand a particular article. The performance of the recommendation approach seems satisfactory, with an NDCG value of 0.87. The proposed approach can play an essential role alongside other existing approaches to recommend research articles.
Chapter
The noise, sound pollution, and harshness are steadily increasing alarmingly from various vehicles such as lorries, vans, cars, and buses. It is considered a significant concern in our modern life due to the long-term harmful effect on human health. The excessive vehicular noise and sound pollution mainly affecting in specific areas of smart cities such as hospitals, educational institutes, various private/public organizations. As a result, people are suffering from neurocognitive problems. Therefore, controlling the sound pollution from vehicles are incredibly essential to mitigate health issues globally. However, minimal research was conducted by monitoring vehicular noise intensity in a smart city area using smart, reliable, and sophisticated technologies such as the internet of things (IoT) to mitigate these issues. This paper presents an enhanced intelligent IoT-enabled vehicular noise intensity monitoring system to protect the city dweller’s health by reducing sound pollution in the smart city. We are planned to propose an architectural framework using noise intensity measuring sensors, 360° Camera, 360° LIDAR, GPS (Global Positioning System) in the vehicle with IoT technology for monitoring individual vehicles and their nearby vehicles over the smart city. The proposed system can monitor real-timely vehicular noises and notify designated stakeholders (i.e., vehicle owner, city authority) promptly and stored in the cloud with valid proofs. The proposed system helps to develop an ideal vehicular noise monitoring system over the smart cities and the widespread significance of mitigating health problems that relate to sound pollution and increase public awareness.
Conference Paper
The autonomous driving is increasingly mounting, promoting, and promising the future of fully autonomous and, correspondingly presenting new challenges in the field of safety assurance. The unexpected and sudden lane change are extremely serious causes of traffic accident and, such an accident scheme leads the multiple vehicle collisions. Extensive evaluation of recent crash data we found a crucial indication that autonomous driving systems are most prone to rear-end collision, which is the leading factor of chain crash. Learning based self-developing assessment assists the operators in providing the necessary prediction operations or even replace them. Here we proposed a Reinforcement learning-based conceptual framework for threat assessment system and scrutinize critical situations that leads to multiple vehicle collisions in autonomous driving. This paper will encourage our transport community to rethink the existing autonomous driving models and reach out to other disciplines, particularly robotics and machine learning, to join forces to create a secure and effective system.
Article
When harvesting grain crops on large farms, a combine collects the grain while a grain cart transports the grain by commuting between the combine and a semi-trailer parked by the roadside. There are several issues associated with human-operated grain carts: labor shortage and increasing labor cost, operational imprecision and inefficiency as well as safety hazards, all of which can potentially be addressed if grain carts were autonomous. This paper presents a motion planning algorithm and the associated navigation solution for autonomous grain carts. The algorithm features a novel integration of Artificial Potential Field (APF) with Fuzzy Logic Control (FLC). A set of simulation tests were carried out, comparing the proposed APF+FLC planner with a simple APF planner. The test results verified the effectiveness, robustness, and efficiency of the proposed planning algorithm in performing the logistical tasks in harvest operations where unharvested crops were the only obstacles as well as when random static or dynamic obstacles existed. In addition, a set of mobile robot tests implementing the proposed navigation solution were conducted, in which the robot representing the grain cart autonomously accomplished the logistical tasks in the harvest operations, verifying the effectiveness and practicality of the navigation solution.
Article
This paper is focused on the time-optimal Multi-Vehicle Trajectory Planning (MVTP) problem for multiple car-like robots when they travel in a tiny indoor scenario occupied by static obstacles. Herein, the complexity of the concerned MVTP task includes i) the non-convexity and narrowness of the environment, ii) the nonholonomy and nonlinearity of the vehicle kinematics, iii) the pursuit for a time-optimal solution, and iv) the absence of predefined homotopic routes for the vehicles. The aforementioned factors, when mixed together, are beyond the capability of the prevalent coupled or decoupled MVTP methods. This work proposes an adaptive-scaling constrained optimization (ASCO) approach, aiming to find the optimum of the nominally intractable MVTP problem in a decoupled way. Concretely, an iterative computation framework is built, wherein each intermediate subproblem contains only risky collision avoidance constraints within a certain range, thus being tractable in the scale. During the iteration, the constraint activation scale can change adaptively, thereby enabling to promote the convergence rate, to recover from an intermediate failure, and to get rid of a poor initial guess. ASCO is extensively compared versus the state-of-the-art MVTP methods in challenging simulation cases and is validated in real experiments conducted by a team of three car-like robots.