Content uploaded by Saydul Akbar Murad
Author content
All content in this area was uploaded by Saydul Akbar Murad on Feb 17, 2022
Content may be subject to copyright.
2021 International Conference on Emerging Technology in Computing, Communication and Electronics
(ETCCE)
Optimal Safety Planning and Driving
Decision-Making for Multiple Autonomous
Vehicles: A Learning Based Approach
Abu Jafar Md Muzahid
Faculty of Computing
Universiti Malaysia Pahang
26600, Pekan, Pahang, Malaysia
mrumi98@gmail.com
Md. Abdur Rahim
Department of Mechanical Engineering
Universiti Malaysia Pahang
26300, Gambang, Pahang, Malaysia
mdabdurrahim.me2k7@gmail.com
Saydul Akbar Murad
Faculty of Computing
Universiti Malaysia Pahang
26600, Pekan, Pahang, Malaysia
saydulakbarmurad@gmail.com
Syafiq Fauzi Kamarulzaman
Faculty of Computing
Fellow of Automotive Engineering Center
Universiti Malaysia Pahang
26600, Pahang, Malaysia
syafiq29@ump.edu.my
Md Arafatur Rahman
School of Mathematics and Computer Science
Senior Lecturer
University of Wolverhampton, UK
Arafatur.rahman@wlv.ac.uk
Arafatur.rahman@ieee.org
Abstract—In the early diffusion stage of autonomous vehicle
systems, the controlling of vehicles through exacting decision-
making to reduce the number of collisions is a major problem.
This paper offers a DRL-based safety planning decision-making
scheme in an emergency that leads to both the first and multiple
collisions. Firstly, the lane-changing process and braking method
are thoroughly analyzed, taking into account the critical aspects
of developing an autonomous driving safety scheme. Secondly,
we propose a DRL strategy that specifies the optimum driving
techniques. We use a multiple-goal reward system to balance
the accomplishment rewards from cooperative and competitive
approaches, accident severity, and passenger comfort. Thirdly,
the deep deterministic policy gradient (DDPG), a basic actor-
critic (AC) technique, is used to mitigate the numerous collision
problems. This approach can improve the efficacy of the optimal
strategy while remaining stable for ongoing control mechanisms.
In an emergency, the agent car can adapt optimum driving
behaviors to enhance driving safety when adequately trained
strategies. Extensive simulations show our concept’s effectiveness
and worth in learning efficiency, decision accuracy, and safety.
Keywords— Autonomous driving, Multiple vehicle collision,
Robotics, Reinforcement Learning.
I. INT ROD UC TI ON
The vast AI technology has enhanced traffic efficiency and safety
while also opening the road for autonomous vehicles. Algorithms
capable of handling complicated scenarios are required to build the
next generation of driver assistance systems or autonomous driving
systems. Many researchers have offered ways on perception, threat
assessment [1], decision making, and vehicle control. However, one
of the key impediments in autonomous driving is the decision-
making process in critical situations. The decision-making process in
critical scenarios is, nevertheless, one of the significant hurdles to au-
tonomous driving. The issue to evaluate driving behavior is that most
solutions are restricted to avoiding a single-vehicle collision without
reliable trajectory forecasts of other participants. This research will
focus on developing a safety planning decision-making scheme for
autonomous automobiles in multi-vehicle crash scenarios to solve this
difficulty. Numerous groups have looked into the problems of making
solid strategic decisions on autonomous vehicles in a congested and
dynamic urban setting. In Figure: 1 is the evaluations of multiple
vehicle collision and the avoidance time interventions of safety
planing. It creates an optimal safety strategy based on reinforcement
learning to protect the impending first and chain collisions and reduce
the severity of multiple crashes [2]. The problem of multi-vehicle
collision resolution during unexpected deceleration and lane change
is described as a multi-objective optimization problem (MOP) [3],
with acceleration as the single decision variable. Our research intends
to design a cooperation planning scheme for collision prevention
to produce sequences of maneuvered decisions in real-time [4].
Reinforcement learning is the strategy used for assessing actions
made in any given state by learning an approximation value function
and is employed to form an overall decision-making process in our
system. Combinations between the position and the orientation of
both vehicles are considered system conditions, whereas combina-
tions between the movements of both vehicles are characterized
as actions. Because the pair of state-action are multidimensional
978-1-6654-8364-3/21/$31.00 ©2021 IEEE
2021 Emerging Technology in Computing, Communication and Electronics (ETCCE) | 978-1-6654-8364-3/21/$31.00 ©2021 IEEE | DOI: 10.1109/ETCCE54784.2021.9689820
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
Time to collision
Full brakingPartial brakingCollision warning
CW PB1PB2FB
Figure 1: Evaluation of Multiple Collision Caused By Sudden
Lane Change Where The Ego Vehicle Receive CW (Collision
Warning) From The Leading Vehicle and Immediately Deter-
mine The P B1(Partial Break Time-1) and P B2(Partial Break
Time-2) Then Finally The FB (Full Break Time).
and continuous, reinforcement learning aids the difficult task of
determining the value function of this multidimensional problem.
Our scheme is innovative in that the collision-avoidance safety
planning challenge is articulated as a sequential decision method
in a continuous multidimensional structure [5] and addressed via
reinforcement learning in the interior a continuous action space. The
presenting of two solutions to collision avoidance multi-objective
optimization problems is the work’s key contribution. First, the deep
deterministic policy gradient (DDPG) algorithm of reinforcement
learning with continuous actions is applied in the cooperative [6] and
competitive aspects to maximize the overall driving benefits. Because
of the predicted gradient of its action-value function, DDPG can be
estimated significantly more reliably than a typical stochastic policy
gradient. The suggested decision-making approach and calculation
algorithms were evaluated in numerous typical scenarios using a
simulation and verification platform based on Unity3D Game Engine
with ML agents and Python API. The following are the key concerns
of this paper:
1) In order to avoid multi-vehicle crashes in emergency and severe
conditions, a general decision-support safety scheme for mul-
tiple autonomous vehicle driving is presented, which combines
two alternative driving techniques and retains the efficiency of
the driving strategy.
2) To keep the steering and acceleration of the ego vehicle bound,
a novel driving strategy is developed.
3) Using Unity3D Game Engine, the new strategy will be created
and tested. The associated performance outcomes are assessed.
The following is how the rest of the paper is organised: The prior
work in this domain is represented in Section II. The approach
employed in this study is discussed in Section III, which includes
an overview of the deep deterministic policy gradient algorithm for
reinforcement learning and the simulation setup and model training
details. In Section IV the introduces the simulation verification
platform for evaluating the proposed method’s efficacy and reliability.
Finally the future works and Conclusions are provided in Section V.
II. RE LATE D WOR KS
Collision avoidance for multiple vehicles is a hotly debated topic
in academia [7]. The majority of the early research focused on
two-dimensional safety path planning in the context of a group
of autonomous vehicles attempting to avoid stationary objects. Re-
searchers have recently focused on the necessity of automobile
collision avoidance. Some ways [8] consider other vehicles to be
movable impediments for each vehicle. By projecting their measured
velocities, one can estimate where other vehicles will be in the future
and prevent collisions accordingly [8] has presented a collide-free
approach to navigate a collection of independent unmanned vehicles.
The individual positioning and orientation information is translated
into a navigation variable to provide the navigational function. But
the vehicle’s continuous speed or turning radius cannot be restricted
by that lone way. Since the safe operation of the current time stage
may lead to future collisions, vehicles could have to alter direction
immediately, which in many realistic circumstances is not possible
due to these vehicles’ movie limits. Other investigations [9] use
parametric curves to shape the way in which every movable object
of the environment may take smooth distances and ultimately reach
its target. For vehicles to trace these routes, however, their speed and
direction must constantly vary, and the size of the change is huge
and not practical. We, therefore, believe that cars are traveling at
a constant speed and gradually shifting their orientation by means
of circular bows in order to make it easy to execute the proposed
algorithm in real-time with massive scenarios for safe path planning.
In previous related research, each vehicle calculates its near-
optimal path and plans its motion solitary by following a collection of
local rules [10]. In an earlier related study, every vehicle determines
its almost optimum trajectory and plans its movement by obeying
an assortment of local regulations alone [11]. In [12], a localized
soccer robots route planning technique is provided; the turning radius
limits and the robot’s speed constraints are explored. In many cir-
cumstances, however, a localized track planning scheme [13] cannot
manage arbitrary traffic because of the cinematic unpredictability
of the individuals concerned. Heuristic techniques like the genetic
algorithm are utilized in a certain study in order to reach resolutions
to this challenge by including all vehicles in the scheme [14].
For instance, research conducted in [15] proposes an optimal crash
avoidance strategy between multiple robots, enabling them to avoid
prospective collisions without any new collision. But it is difficult for
a vehicle to alter its route if it has several potential crashes since the
system needs tremendous computational work [16], [17].
In any dynamic and continuous autonomous driving situation,
when confronted with the challenge of safety planning, artificial
potential field(APF) techniques [18] are equated with continuous,
static likely pitch equivalent equations. In real-time, APF systems
for control and navigation can be quickly deployed and executed.
These algorithms attempt to achieve an objective by employing virtual
forces to avoid impediments on the trajectory, which attract or pull it
away. One advantage of this approach is that it may take account of
various limits solely by adding particular forces. In order to utilize
APFs in distinct autonomous driving scenarios, the different new
potential was proposed depending on road structure or vehicle physics
[19], and intersection. Various enhancements have been suggested to
address additional constraints of the classic autonomous driving APF
algorithm. In order to prevent local minimum challenge, the modified
APF model can also have a virtual obstacle or a location addressed
of target point. The close obstacle problem can be addressed by
changing the computation of APF utilizing fuzzy logic. A further
artificial friction force to reduce oscillations was introduced in [20],
[21]. While APF algorithms may be sufficient for the outcome,
the vehicle’s final design is unpredictable, resulting in hazardous
scenarios.
Another essential drawback with this approach is that it is difficult
to consider the vehicle’s kinematic restrictions [22]. This approach
cannot be ensured the mechanical feasibility of paths. The methods
of elastic bands (EB) are likewise derived from physical similarities.
The anticipated path of the goal is represented by a succession of
springs, which can be distorted in response to environmental changes.
The EB’s intrinsic forces restrict neighbouring path nodes, although
the approach struggles with exact kinematic constraints. Choosing
the final point for the trajectory is likewise a challenging emergency
question. In the autonomous driving manner, vehicle control [23]
is responsible for following the theoretic trajectory predicted using
the prospective algorithm [24], [25]. Vehicle kinematics models,
such as the bicycle model, have been employed in a series, mostly
steering angle and accelerated, of commandments for translating this
trajectory (x(t), y(t)). Several control methods were utilised for com-
paratively low-speed driving environment, including: PID controls,
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
pure persecution controllers, and Stanley controllers. At high speeds
or with a significant curvature change rate, the dynamic model-
based control approaches function better. The nonlinear control and
adoption of the Model Predictive Control, as well as the feedback
feedforward controlling, can boost vehicle stability at high speeds.
These methods, however, presuppose that the traffic environment is
fully known, including the intentions of other road users. In view
of the environmental unpredictability, the safety decision-making
job is typically modelled on the Markov Decision Process that
sometimes was partially observable and applied to numerous driving
scenarios. The primary theme of the safety planning method is the
planning of driving manoeuvres, i.e. the development of optimum
driving behaviours for a particular scenario, based on the tracks of
participant vehicles. The rapid growth of machine learning allowed
a mix of classical methods and ways of improved learning to make
autonomous decisions in very interactive environments such as the
learning process or, more recently, Reinforcement Learning.
Lane Keeping
Lane Change
Environment
Figure 2: Design of Training Environment for SPS (Safety
Planing Scheme) Mechanism of Avoid Multiple Vehicle Col-
lision.
III. METHODOLOGY
This study is based on the concept of a cooperative and competitive
strategy in multiple autonomous agent vehicles. In order to build a
conceptual framework involving perception, communication and co-
operation, threat assessment, decision making, and finally, the vehicle
control modules, certain components are required as part of our core
project. A multi-constrained issue was resolved by an optimal safety
planning application to mitigate multiple vehicle collisions, including
risk prediction. As far as general architecture is concerned, we are
developing an environment scenario in which the ego vehicle takes
safety decisions based on upcoming obstacles, lane maintenance and
lane-change decisions, and the overall demonstration in Figure: 2.
A. Deep Deterministic Policy Gradient
The learning method is like human learning, depicted as a Markov
decision-making process (S, A, P, R). DeepMind proposed the DQN
algorithm in 2013, which opened a new era in deep reinforcement
learning. The key enhancement of the algorithm is the utilization
of expert replay and the construction of a second target network to
erase the link among the training samples and increase the training
stability [26]. Certain DQN developed algorithms have significantly
advanced in the discrete action space problem. However, the issue
of continuous strategic control is quite challenging to understand.
DeepMind proposed the DDPG method based on the DPG and DQN
algorithms in 2015, and the standardization process was imported
into the deep learning environment. Experiments have shown that the
approach provided works effectively on numerous types of continuous
control issues. A new actor-critics technique is the DDPG algorithm.
Parameter
Distribution
Communication
[VANET] Collision Threat
Prediction
DDPG
Agent
Figure 3: Demonstration of SPS (Safety Planing Scheme)
Agent.
The actor function π(s|µ)creates an action given current status
in an actor-critical algorithm. The critic criticizes an action-value
Q(s, a|A)function on the basis of the output of the actor and the
current state. The TD errors created by the critic drive learning in
the critical network, and then the actor’s network is upgraded on the
basis of the policy gradient. The DDPG algorithm merges the benefits
of the actor-critical and DQN algorithms to facilitate convergence.
DDPG introduces certain DQN ideas, which use the target network
[27]. According to this cognitive manner we build a training agent
and the we illustrate it in Figure: 2.
B. Experimental Setup
Extensive experiments are conducted to quantify the two main
autonomous driving metrics, namely the total rewards, which show
the overall success of our scheme and the number of collisions
across both cooperative and competitive approaches to the system.
We use the Unity3d game engine to create the environment depicted
in Figure: 2 to explain the entire driving system. Since the road is
approximately 886 m and 15 m as width and length is intended.
We progressively introduced vehicles to the path and observed the
performance in relation to their learning behavior. For example,
two conventional vehicles (CVs) operate on the highway in these
simulations. We then change the number of DDPG-equipped au-
tonomous driving vehicles (AVs) for testing the proposed driving
scheme. Defining reward function, we follow the Figure: 4 and the
equation deployed by N.Sugiyama et al. [28] ino of vehicles motion
and velocity model:
d2zi
dt2=a{V(∆zi)−dzi
dt }
here the optimal-velocity function presented by V(∆zi), zi(t)indi-
cates the position of ith vehicle at time t, and ais the sensitivity
(the inverse of the suspension of vehicle iat time t,zi(t)(=
zi+1(t)−zi(t)) is the headway of time). This is achieved by
comparing different simulations performances. In order to achieve
cooperative and competitive approaches, the reward values alter, and
each vehicle’s R communication range is 80 [0; 80] meters. The
vehicles have a speed range of 80 km=h and 120 km=h, respectively.
The values of the other parameters are defined as steering [-45,45],
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
acceleration [0,1], brake [0,1], angle [-90,90]. In TensorFlow, we have
built the driving arrangement by using two hidden-layer networks
with neurons as a non-linear function in order to get the optimum
policy. In each layer, there are about 300 and 400 neurons. The
learning rate is 1e−6, and the batch size is 32. In order to compare the
performance of our proposed approach, we apply the decentralized
collision avoidance policy for multi-robot systems called POMDP
suggested by Pinxin et al. [29]. POMDP employs a multi-agent
deep reinforcement learning architecture to enable several robots
to develop an ideal collision avoidance technique. An experimental
episode indicates a stage on the real-world circuit and hence a whole
race from the beginning to the end. However, the race finishes
abruptly in other cases, like when one agent turns back or leaves the
track edge owing to the accident. As a leading agent, a vehicle sends
its learned model parameters to its following agents within one local
network while the following agents sit inactively and are awaiting
the learned by their leading agent. As we developed an autonomous
non-deterministic driving environment, we performed 10 experiments
and calculated the average results for all runs.
Vehicle 1 BlockageVehicle 2Vehicle 3Vehicle 4
x = 0x
Figure 4: Presentation of Rewards Function Regarding The
Conditions of Multiple Vehicle Collision by Sudden Slow-
down.
IV. RES ULT & DISCUSSION
The agents learn the optimal driving comportment during simula-
tions for mastering the avoidance of collision by using the proposed
driving scheme in Unity3D. The objective of the agent is to improve
its conduct dynamically by learning to prevent collision with other
agents and things near them. In the meantime, their time of arrival
is also minimised. An autonomous vehicles system must scale
efficiently as there are fluctuations in the number of participants.
We thus test the system’s scalability in the different densities of
participating AVs. To assess driving performance, we employed two
Conventional Vehicle (CVs), and one Autonomous Vehicles (AVs)
installed with DDPG. The AVs agents performance is evaluated by
considering mainly the number of collisions suffered by the agents
and the rewards achieved during tests presented in Figure: 5 and
Figure: 6. The first cooperative and the second one are competitive
in results, which indicate that the driving scheme includes two CVs
with one and two AVs working with DDPG. In Figure: 6, the average
number of collisions over a span of 500 episodes during the training
process is presented. Average collisions during the training phase are
given in Figure: 6 for the episodes. In this figure, every point of
collision is measured by adding up to every 50 episodes. From the
data, we note that the average number of collisions in all scenes
is decreased as the number of episodes rise. The average number
of competitive collisions is approximately 211% higher compared
to the cooperative strategy. The reason is that in the system, the
distribution parameter approach of the DDPG is used to increase AVs
performance. The DDPG rewards for nearby autonomous vehicles
in a communication network compare us to each vehicle’s optimal
driving policy. The highest reward is the best conduct of autonomous
vehicles. Furthermore, we see that DDPG’s competitive technique
requires more training time to reach zero collision in Figure: 6. The
highest reward represents the best driving behaviours of autonomous
vehicles. Additionally, we also notice that a longer training time
is needed by the competitive approach of DDPG to achieve zero
collision in Figure: 6. Compared to DDPG’s cooperative approach,
the zero collision objective in the training phases is 13%faster.
Figure 5: For Each Approach, The Average Episode Rewards
Against Time Step.
Figure 6: For Each Approach, The Number of Collisions Per
Episode.
The cooperative rewards gain by autonomous driving vehicles are
measured, and the results are shown in Figure: 5. The results show
the sum of the reward recorded for the entire time steps of training.
The average reward increases for the safety scheme as the number of
episodes grows. At first, the reward is low in all situations; especially
since the cooperative approach of each autonomous vehicles has
been initiated with random learning parameters in the initial stage,
the competitive approach has been very low in a few periods. The
vehicles can therefore not choose the right action for their following
move, leading to chain crashes. The cooperative strategy penalizes
unsuccessful acts by reducing rewards, allowing DDPG to learn from
its mistakes. Then, depending on its previous experiences, it may
choose the correct action in the future episode. As the agent’s learning
experiences improve with each episode, the rewards begin to rise. The
reason is because the cars learnt to take an appropriate action to pre-
vent collisions. In this situation, rewards are given to encourage and
limit the desired driving behaviour. We note that DDPG’s competitive
technique has the lowest reward compared to this cooperative strategy
over a period of 500 episodes. Approximately 33% greater than the
average competitive DDPG Driving Vehicle reward for cooperation
is paid by autonomous driving vehicles. In addition, when a collision
is possible, the DDPG algorithm additionally awards penalties when
the agents move too close together. Interestingly enough, the agents
are not colliding with other vehicles because they acquired better
policies.
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
V. CONCLUSIONS
We analysed in depth the needs and the design objectives of
multiple autonomous vehicles in this paper. In order to achieve
such objectives, we have presented an effective and safety planning
scheme. We have devised an efficient collision prevention technique
using a multi-actor hierarchy. The DDPG’s rewards are defined by
taking into account collision avoidance in the field of cooperative
and competitive approaches, minimising time of arrival, and road
maintenance. Its worth indicates the superiority of driving action;
for example, a superior reward indicates optimum safety driving
behaviour. To boost up the speed of learning process, the parameter
distribution approach is used. Using a privately-held communica-
tion technique with a follower-leader multi-agent manner. We have
demonstrated that the safety scheme efficiently reduces the count
of collisions and scales with an increasing number of autonomous
vehicles through rigorous testing at Unity3d. We also exposed metic-
ulous safety learning, in which vehicles learnt not to collide with
other vehicles but to go off the lane and stop. This enhanced driving
behaviour while lowering danger and liability.As part of our future
work we intend to improve safety by the incorporation of camera-
based images or videos of the environment into the learning process.
In addition, to adapt it to the situations where roads have no necessity
for speed range, we contemplate enhancing the safety scheme.
VI. ACKNOWLEDGMENT
The authors would like to thank Universiti Malaysia Pa-
hang for laboratory facilities as well as additional financial sup-
port under Internal Research grant RDU192202, further thanks
to the Ministry of Higher Education of Malaysia for provid-
ing financial support under Fundamental Research Grant Scheme
(FRGS) No.FRGS/1/2018/TK08/UMP/02/2 (University reference
RDU190137).
REFERENCES
[1] T. B. Sarwar, N. M. Noor, M. S. U. Miah, M. Rashid, F. Al Farid,
and M. N. Husen, “Recommending research articles: A multi-level
chronological learning-based approach using unsupervised keyphrase
extraction and lexical similarity calculation,” IEEE Access, 2021.
[2] S. F. Kamarulzaman and S. Yasunobu, “Cooperative multi-knowledge
learning control system for obstacle consideration,” in International
Conference on Information Processing and Management of Uncertainty
in Knowledge-Based Systems. Springer, 2014, pp. 506–515.
[3] A. Karim, M. A. Islam, P. Mishra, A. J. M. Muzahid, A. Yousuf,
M. M. R. Khan, and C. K. M. Faizal, “Yeast and bacteria co-culture-
based lipid production through bioremediation of palm oil mill effluent:
a statistical optimization,” Biomass Conversion and Biorefinery, pp. 1–
12, 2021.
[4] S. A. Murad, Z. R. M. Azmi, Z. H. Hakami, N. J. Prottasha, and
M. Kowsher, “Computer-aided system for extending the performance
of diabetes analysis and prediction,” in 2021 International Conference
on Software Engineering & Computer Systems and 4th International
Conference on Computational Science and Information Management
(ICSECS-ICOCSIM). IEEE, 2021, pp. 465–470.
[5] M. Kowsher, A. Tahabilder, and S. A. Murad, “Impact-learning: a robust
machine learning algorithm,” in Proceedings of the 8th international
conference on computer and communications management, 2020, pp.
9–13.
[6] M. A. Rahim, M. A. Rahman, M. M. Rahman, A. T. Asyhari, M. Z. A.
Bhuiyan, and D. Ramasamy, “Evolution of iot-enabled connectivity and
applications in automotive industry: A review,” Vehicular Communica-
tions, vol. 27, p. 100285, 2021.
[7] S. F. Kamarulzaman and H. Al Sibai, “Compound learning control
for formation management of multiple autonomous agents.” Pertanika
Journal of Science & Technology, vol. 25, 2017.
[8] A. Widyotriatmo and K.-S. Hong, “Navigation function-based control of
multiple wheeled vehicles,” IEEE Transactions on Industrial Electronics,
vol. 58, no. 5, pp. 1896–1906, 2011.
[9] M. H. Alsibai, S. F. Kamarulzaman, H. A. Alfarra, and Y. H. Naif, “Real
time emergency auto parking system in driver lethargic state for accident
preventing,” in MATEC Web of Conferences, vol. 90. EDP Sciences,
2017, p. 01034.
[10] M. A. Rahim, M. A. Rahman, M. M. Rahman, A. T. Asyhari, M. Z. A.
Bhuiyan, and D. Ramasamy, “Evolution of iot-enabled connectivity and
applications in automotive industry: A review,” Vehicular Communica-
tions, vol. 27, p. 100285, 2021.
[11] Y. S. Chi and S. F. Kamarulzaman, “Intelligent gender recognition
system for classification of gender in malaysian demographic,” in
InECCE2019: Proceedings of the 5th International Conference on Elec-
trical, Control & Computer Engineering, Kuantan, Pahang, Malaysia,
29th July 2019, vol. 632. Springer Nature, 2020, p. 283.
[12] K. Jolly, R. S. Kumar, and R. Vijayakumar, “A bezier curve based path
planning in a multi-agent robot soccer system without violating the
acceleration limits,” Robotics and Autonomous Systems, vol. 57, no. 1,
pp. 23–33, 2009.
[13] W. S. Cheong, S. F. Kamarulzaman, and M. A. Rahman, “Implementa-
tion of robot operating system in smart garbage bin robot with obstacle
avoidance system,” in 2020 Emerging Technology in Computing, Com-
munication and Electronics (ETCCE). IEEE, 2020, pp. 1–6.
[14] K. P. Cheng, R. E. Mohan, N. H. K. Nhan, and A. V. Le, “Multi-
objective genetic algorithm-based autonomous path planning for hinged-
tetro reconfigurable tiling robot,” IEEE Access, vol. 8, pp. 121267–
121 284, 2020.
[15] B. Li, Y. Ouyang, Y. Zhang, T. Acarman, Q. Kong, and Z. Shao, “Opti-
mal cooperative maneuver planning for multiple nonholonomic robots in
a tiny environment via adaptive-scaling constrained optimization,” IEEE
Robotics and Automation Letters, vol. 6, no. 2, pp. 1511–1518, 2021.
[16] L. C. Kiew, A. J. M. Muzahid, and S. F. Kamarulzaman, “Vehicle route
tracking system based on vehicle registration number recognition using
template matching algorithm,” in 2021 International Conference on
Software Engineering Computer Systems and 4th International Confer-
ence on Computational Science and Information Management (ICSECS-
ICOCSIM), 2021, pp. 249–254.
[17] A. J. M. Muzahid, S. F. Kamarulzaman, and M. A. Rahman, “Compar-
ison of ppo and sac algorithms towards decision making strategies for
collision avoidance among multiple autonomous vehicles,” in 2021 In-
ternational Conference on Software Engineering Computer Systems and
4th International Conference on Computational Science and Information
Management (ICSECS-ICOCSIM), 2021, pp. 200–205.
[18] L. Shangguan, J. A. Thomasson, and S. Gopalswamy, “Motion planning
for autonomous grain carts,” IEEE Transactions on Vehicular Technol-
ogy, vol. 70, no. 3, pp. 2112–2123, 2021.
[19] M. A. Rahim, M. Rahman, M. A. Rahman, A. J. M. Muzahid, and S. F.
Kamarulzaman, “A framework of iot-enabled vehicular noise intensity
monitoring system for smart city,” Advances in Robotics, Automation
and Data Analytics: Selected Papers from ICITES 2020, vol. 1350, p.
194, 2021.
[20] M. Fu, H. G. Franquelim, S. Kretschmer, and P. Schwille, “Non-
equilibrium large-scale membrane transformations driven by minde
biochemical reaction cycles,” Angewandte Chemie International Edition,
vol. 60, no. 12, pp. 6496–6502, 2021.
[21] M. S. I. Shofiqul, N. Ab Ghani, and M. M. Ahmed, “A review on
recent advances in deep learning for sentiment analysis: Performances,
challenges and limitations,” 2020.
[22] S. F. Kamarulzaman and M. H. Alsibai, “Time-change-fuzzy-based
intelligent vehicle control system for safe emergency lane transition
during driver lethargic state,” Advanced Science Letters, vol. 24, no. 10,
pp. 7554–7558, 2018.
[23] J. Odili, M. N. M. Kahar, A. Noraziah, and S. F. Kamarulzaman, “A
comparative evaluation of swarm intelligence techniques for solving
combinatorial optimization problems,” International Journal of Ad-
vanced Robotic Systems, vol. 14, no. 3, p. 1729881417705969, 2017.
[24] A. J. M. Muzahid, S. F. Kamarulzaman, and M. A. Rahim, “Learning-
based conceptual framework for threat assessment of multiple vehicle
collision in autonomous driving,” in 2020 Emerging Technology in
Computing, Communication and Electronics (ETCCE). IEEE, 2020,
pp. 1–6.
[25] M. Rashid, M. Islam, N. Sulaiman, B. S. Bari, R. K. Saha, and M. J.
Hasan, “Electrocorticography based motor imagery movements classi-
fication using long short-term memory (lstm) based on deep learning
approach,” SN Applied Sciences, vol. 2, no. 2, pp. 1–7, 2020.
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.
[26] M. Miah, J. Sulaiman, T. B. Sarwar, K. Z. Zamli, and R. Jose, “Study of
keyword extraction techniques for electric double-layer capacitor domain
using text similarity indexes: An experimental analysis,” Complexity, vol.
2021, 2021.
[27] M. M. Hasan, M. S. Islam, and S. Abdullah, “Robust pose-based
human fall detection using recurrent neural network,” in 2019 IEEE
International Conference on Robotics, Automation, Artificial-intelligence
and Internet-of-Things (RAAICON). IEEE, 2019, pp. 48–51.
[28] N. Sugiyama and T. Nagatani, “Multiple-vehicle collision induced by
a sudden stop in traffic flow,” Physics Letters A, vol. 376, no. 22, pp.
1803–1806, 2012.
[29] P. Long, T. Fan, X. Liao, W. Liu, H. Zhang, and J. Pan, “Towards
optimally decentralized multi-robot collision avoidance via deep rein-
forcement learning,” in 2018 IEEE International Conference on Robotics
and Automation (ICRA). IEEE, 2018, pp. 6252–6259.
Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.