Conference PaperPDF Available

Optimal Safety Planning and Driving Decision-Making for Multiple Autonomous Vehicles: A Learning Based Approach

December 2021

December 2021

DOI:10.1109/ETCCE54784.2021.9689820

Conference: 2021 Emerging Technology in Computing, Communication and Electronics (ETCCE)

Authors:

Abu Jafar Md Muzahid

University of Tennessee

Md Abdur Rahim

Deakin University

Saydul Akbar Murad

University of Southern Mississippi

Syafiq Fauzi Kamarulzaman

Universiti Malaysia Pahang

Show all 5 authorsHide

Design of Training Environment for SPS (Safety Planing Scheme) Mechanism of Avoid Multiple Vehicle Collision.

…

Demonstration of SPS (Safety Planing Scheme) Agent.

…

Presentation of Rewards Function Regarding The Conditions of Multiple Vehicle Collision by Sudden Slowdown.

…

For Each Approach, The Average Episode Rewards Against Time Step.

…

For Each Approach, The Number of Collisions Per Episode.

…

Figures - uploaded by Saydul Akbar Murad

Content may be subject to copyright.

Content uploaded by Saydul Akbar Murad

Content may be subject to copyright.

2021 International Conference on Emerging Technology in Computing, Communication and Electronics

(ETCCE)

Optimal Safety Planning and Driving

Decision-Making for Multiple Autonomous

Vehicles: A Learning Based Approach

Abu Jafar Md Muzahid

Faculty of Computing

Universiti Malaysia Pahang

26600, Pekan, Pahang, Malaysia

mrumi98@gmail.com

Md. Abdur Rahim

Department of Mechanical Engineering

Universiti Malaysia Pahang

26300, Gambang, Pahang, Malaysia

mdabdurrahim.me2k7@gmail.com

Saydul Akbar Murad

Faculty of Computing

Universiti Malaysia Pahang

26600, Pekan, Pahang, Malaysia

saydulakbarmurad@gmail.com

Syaﬁq Fauzi Kamarulzaman

Faculty of Computing

Fellow of Automotive Engineering Center

Universiti Malaysia Pahang

26600, Pahang, Malaysia

syaﬁq29@ump.edu.my

Md Arafatur Rahman

School of Mathematics and Computer Science

Senior Lecturer

University of Wolverhampton, UK

Arafatur.rahman@wlv.ac.uk

Arafatur.rahman@ieee.org

Abstract—In the early diffusion stage of autonomous vehicle

systems, the controlling of vehicles through exacting decision-

making to reduce the number of collisions is a major problem.

This paper offers a DRL-based safety planning decision-making

scheme in an emergency that leads to both the ﬁrst and multiple

collisions. Firstly, the lane-changing process and braking method

are thoroughly analyzed, taking into account the critical aspects

of developing an autonomous driving safety scheme. Secondly,

we propose a DRL strategy that speciﬁes the optimum driving

techniques. We use a multiple-goal reward system to balance

the accomplishment rewards from cooperative and competitive

approaches, accident severity, and passenger comfort. Thirdly,

the deep deterministic policy gradient (DDPG), a basic actor-

critic (AC) technique, is used to mitigate the numerous collision

problems. This approach can improve the efﬁcacy of the optimal

strategy while remaining stable for ongoing control mechanisms.

In an emergency, the agent car can adapt optimum driving

behaviors to enhance driving safety when adequately trained

strategies. Extensive simulations show our concept’s effectiveness

and worth in learning efﬁciency, decision accuracy, and safety.

Keywords— Autonomous driving, Multiple vehicle collision,

Robotics, Reinforcement Learning.

I. INT ROD UC TI ON

The vast AI technology has enhanced trafﬁc efﬁciency and safety

while also opening the road for autonomous vehicles. Algorithms

capable of handling complicated scenarios are required to build the

next generation of driver assistance systems or autonomous driving

systems. Many researchers have offered ways on perception, threat

assessment [1], decision making, and vehicle control. However, one

of the key impediments in autonomous driving is the decision-

making process in critical situations. The decision-making process in

critical scenarios is, nevertheless, one of the signiﬁcant hurdles to au-

tonomous driving. The issue to evaluate driving behavior is that most

solutions are restricted to avoiding a single-vehicle collision without

reliable trajectory forecasts of other participants. This research will

focus on developing a safety planning decision-making scheme for

autonomous automobiles in multi-vehicle crash scenarios to solve this

difﬁculty. Numerous groups have looked into the problems of making

solid strategic decisions on autonomous vehicles in a congested and

dynamic urban setting. In Figure: 1 is the evaluations of multiple

vehicle collision and the avoidance time interventions of safety

planing. It creates an optimal safety strategy based on reinforcement

learning to protect the impending ﬁrst and chain collisions and reduce

the severity of multiple crashes [2]. The problem of multi-vehicle

collision resolution during unexpected deceleration and lane change

is described as a multi-objective optimization problem (MOP) [3],

with acceleration as the single decision variable. Our research intends

to design a cooperation planning scheme for collision prevention

to produce sequences of maneuvered decisions in real-time [4].

Reinforcement learning is the strategy used for assessing actions

made in any given state by learning an approximation value function

and is employed to form an overall decision-making process in our

system. Combinations between the position and the orientation of

both vehicles are considered system conditions, whereas combina-

tions between the movements of both vehicles are characterized

as actions. Because the pair of state-action are multidimensional

Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.

Time to collision

Full brakingPartial brakingCollision warning

CW PB1PB2FB

Figure 1: Evaluation of Multiple Collision Caused By Sudden

Lane Change Where The Ego Vehicle Receive CW (Collision

Warning) From The Leading Vehicle and Immediately Deter-

mine The P B1(Partial Break Time-1) and P B2(Partial Break

Time-2) Then Finally The FB (Full Break Time).

and continuous, reinforcement learning aids the difﬁcult task of

determining the value function of this multidimensional problem.

Our scheme is innovative in that the collision-avoidance safety

planning challenge is articulated as a sequential decision method

in a continuous multidimensional structure [5] and addressed via

reinforcement learning in the interior a continuous action space. The

presenting of two solutions to collision avoidance multi-objective

optimization problems is the work’s key contribution. First, the deep

deterministic policy gradient (DDPG) algorithm of reinforcement

learning with continuous actions is applied in the cooperative [6] and

competitive aspects to maximize the overall driving beneﬁts. Because

of the predicted gradient of its action-value function, DDPG can be

estimated signiﬁcantly more reliably than a typical stochastic policy

gradient. The suggested decision-making approach and calculation

algorithms were evaluated in numerous typical scenarios using a

simulation and veriﬁcation platform based on Unity3D Game Engine

with ML agents and Python API. The following are the key concerns

of this paper:

1) In order to avoid multi-vehicle crashes in emergency and severe

conditions, a general decision-support safety scheme for mul-

tiple autonomous vehicle driving is presented, which combines

two alternative driving techniques and retains the efﬁciency of

the driving strategy.

2) To keep the steering and acceleration of the ego vehicle bound,

a novel driving strategy is developed.

3) Using Unity3D Game Engine, the new strategy will be created

and tested. The associated performance outcomes are assessed.

The following is how the rest of the paper is organised: The prior

work in this domain is represented in Section II. The approach

employed in this study is discussed in Section III, which includes

an overview of the deep deterministic policy gradient algorithm for

reinforcement learning and the simulation setup and model training

details. In Section IV the introduces the simulation veriﬁcation

platform for evaluating the proposed method’s efﬁcacy and reliability.

Finally the future works and Conclusions are provided in Section V.

II. RE LATE D WOR KS

Collision avoidance for multiple vehicles is a hotly debated topic

in academia [7]. The majority of the early research focused on

two-dimensional safety path planning in the context of a group

of autonomous vehicles attempting to avoid stationary objects. Re-

searchers have recently focused on the necessity of automobile

collision avoidance. Some ways [8] consider other vehicles to be

movable impediments for each vehicle. By projecting their measured

velocities, one can estimate where other vehicles will be in the future

and prevent collisions accordingly [8] has presented a collide-free

approach to navigate a collection of independent unmanned vehicles.

The individual positioning and orientation information is translated

into a navigation variable to provide the navigational function. But

the vehicle’s continuous speed or turning radius cannot be restricted

by that lone way. Since the safe operation of the current time stage

may lead to future collisions, vehicles could have to alter direction

immediately, which in many realistic circumstances is not possible

due to these vehicles’ movie limits. Other investigations [9] use

parametric curves to shape the way in which every movable object

of the environment may take smooth distances and ultimately reach

its target. For vehicles to trace these routes, however, their speed and

direction must constantly vary, and the size of the change is huge

and not practical. We, therefore, believe that cars are traveling at

a constant speed and gradually shifting their orientation by means

of circular bows in order to make it easy to execute the proposed

algorithm in real-time with massive scenarios for safe path planning.

In previous related research, each vehicle calculates its near-

optimal path and plans its motion solitary by following a collection of

local rules [10]. In an earlier related study, every vehicle determines

its almost optimum trajectory and plans its movement by obeying

an assortment of local regulations alone [11]. In [12], a localized

soccer robots route planning technique is provided; the turning radius

limits and the robot’s speed constraints are explored. In many cir-

cumstances, however, a localized track planning scheme [13] cannot

manage arbitrary trafﬁc because of the cinematic unpredictability

of the individuals concerned. Heuristic techniques like the genetic

algorithm are utilized in a certain study in order to reach resolutions

to this challenge by including all vehicles in the scheme [14].

For instance, research conducted in [15] proposes an optimal crash

avoidance strategy between multiple robots, enabling them to avoid

prospective collisions without any new collision. But it is difﬁcult for

a vehicle to alter its route if it has several potential crashes since the

system needs tremendous computational work [16], [17].

In any dynamic and continuous autonomous driving situation,

when confronted with the challenge of safety planning, artiﬁcial

potential ﬁeld(APF) techniques [18] are equated with continuous,

static likely pitch equivalent equations. In real-time, APF systems

for control and navigation can be quickly deployed and executed.

These algorithms attempt to achieve an objective by employing virtual

forces to avoid impediments on the trajectory, which attract or pull it

away. One advantage of this approach is that it may take account of

various limits solely by adding particular forces. In order to utilize

APFs in distinct autonomous driving scenarios, the different new

potential was proposed depending on road structure or vehicle physics

[19], and intersection. Various enhancements have been suggested to

address additional constraints of the classic autonomous driving APF

algorithm. In order to prevent local minimum challenge, the modiﬁed

APF model can also have a virtual obstacle or a location addressed

of target point. The close obstacle problem can be addressed by

changing the computation of APF utilizing fuzzy logic. A further

artiﬁcial friction force to reduce oscillations was introduced in [20],

[21]. While APF algorithms may be sufﬁcient for the outcome,

the vehicle’s ﬁnal design is unpredictable, resulting in hazardous

scenarios.

Another essential drawback with this approach is that it is difﬁcult

to consider the vehicle’s kinematic restrictions [22]. This approach

cannot be ensured the mechanical feasibility of paths. The methods

of elastic bands (EB) are likewise derived from physical similarities.

The anticipated path of the goal is represented by a succession of

springs, which can be distorted in response to environmental changes.

The EB’s intrinsic forces restrict neighbouring path nodes, although

the approach struggles with exact kinematic constraints. Choosing

the ﬁnal point for the trajectory is likewise a challenging emergency

question. In the autonomous driving manner, vehicle control [23]

is responsible for following the theoretic trajectory predicted using

the prospective algorithm [24], [25]. Vehicle kinematics models,

such as the bicycle model, have been employed in a series, mostly

steering angle and accelerated, of commandments for translating this

trajectory (x(t), y(t)). Several control methods were utilised for com-

paratively low-speed driving environment, including: PID controls,

Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.

pure persecution controllers, and Stanley controllers. At high speeds

or with a signiﬁcant curvature change rate, the dynamic model-

based control approaches function better. The nonlinear control and

adoption of the Model Predictive Control, as well as the feedback

feedforward controlling, can boost vehicle stability at high speeds.

These methods, however, presuppose that the trafﬁc environment is

fully known, including the intentions of other road users. In view

of the environmental unpredictability, the safety decision-making

job is typically modelled on the Markov Decision Process that

sometimes was partially observable and applied to numerous driving

scenarios. The primary theme of the safety planning method is the

planning of driving manoeuvres, i.e. the development of optimum

driving behaviours for a particular scenario, based on the tracks of

participant vehicles. The rapid growth of machine learning allowed

a mix of classical methods and ways of improved learning to make

autonomous decisions in very interactive environments such as the

learning process or, more recently, Reinforcement Learning.

Lane Keeping

Lane Change

Environment

Figure 2: Design of Training Environment for SPS (Safety

Planing Scheme) Mechanism of Avoid Multiple Vehicle Col-

lision.

III. METHODOLOGY

This study is based on the concept of a cooperative and competitive

strategy in multiple autonomous agent vehicles. In order to build a

conceptual framework involving perception, communication and co-

operation, threat assessment, decision making, and ﬁnally, the vehicle

control modules, certain components are required as part of our core

project. A multi-constrained issue was resolved by an optimal safety

planning application to mitigate multiple vehicle collisions, including

risk prediction. As far as general architecture is concerned, we are

developing an environment scenario in which the ego vehicle takes

safety decisions based on upcoming obstacles, lane maintenance and

lane-change decisions, and the overall demonstration in Figure: 2.

A. Deep Deterministic Policy Gradient

The learning method is like human learning, depicted as a Markov

decision-making process (S, A, P, R). DeepMind proposed the DQN

algorithm in 2013, which opened a new era in deep reinforcement

learning. The key enhancement of the algorithm is the utilization

of expert replay and the construction of a second target network to

erase the link among the training samples and increase the training

stability [26]. Certain DQN developed algorithms have signiﬁcantly

advanced in the discrete action space problem. However, the issue

of continuous strategic control is quite challenging to understand.

DeepMind proposed the DDPG method based on the DPG and DQN

algorithms in 2015, and the standardization process was imported

into the deep learning environment. Experiments have shown that the

approach provided works effectively on numerous types of continuous

control issues. A new actor-critics technique is the DDPG algorithm.

Parameter

Distribution

Communication

[VANET] Collision Threat

Prediction

DDPG

Agent

Figure 3: Demonstration of SPS (Safety Planing Scheme)

Agent.

The actor function π(s|µ)creates an action given current status

in an actor-critical algorithm. The critic criticizes an action-value

Q(s, a|A)function on the basis of the output of the actor and the

current state. The TD errors created by the critic drive learning in

the critical network, and then the actor’s network is upgraded on the

basis of the policy gradient. The DDPG algorithm merges the beneﬁts

of the actor-critical and DQN algorithms to facilitate convergence.

DDPG introduces certain DQN ideas, which use the target network

[27]. According to this cognitive manner we build a training agent

and the we illustrate it in Figure: 2.

B. Experimental Setup

Extensive experiments are conducted to quantify the two main

autonomous driving metrics, namely the total rewards, which show

the overall success of our scheme and the number of collisions

across both cooperative and competitive approaches to the system.

We use the Unity3d game engine to create the environment depicted

in Figure: 2 to explain the entire driving system. Since the road is

approximately 886 m and 15 m as width and length is intended.

We progressively introduced vehicles to the path and observed the

performance in relation to their learning behavior. For example,

two conventional vehicles (CVs) operate on the highway in these

simulations. We then change the number of DDPG-equipped au-

tonomous driving vehicles (AVs) for testing the proposed driving

scheme. Deﬁning reward function, we follow the Figure: 4 and the

equation deployed by N.Sugiyama et al. [28] ino of vehicles motion

and velocity model:

d2zi

dt2=a{V(∆zi)−dzi

dt }

here the optimal-velocity function presented by V(∆zi), zi(t)indi-

cates the position of ith vehicle at time t, and ais the sensitivity

(the inverse of the suspension of vehicle iat time t,zi(t)(=

zi+1(t)−zi(t)) is the headway of time). This is achieved by

comparing different simulations performances. In order to achieve

cooperative and competitive approaches, the reward values alter, and

each vehicle’s R communication range is 80 [0; 80] meters. The

vehicles have a speed range of 80 km=h and 120 km=h, respectively.

The values of the other parameters are deﬁned as steering [-45,45],

Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.

acceleration [0,1], brake [0,1], angle [-90,90]. In TensorFlow, we have

built the driving arrangement by using two hidden-layer networks

with neurons as a non-linear function in order to get the optimum

policy. In each layer, there are about 300 and 400 neurons. The

learning rate is 1e−6, and the batch size is 32. In order to compare the

performance of our proposed approach, we apply the decentralized

collision avoidance policy for multi-robot systems called POMDP

suggested by Pinxin et al. [29]. POMDP employs a multi-agent

deep reinforcement learning architecture to enable several robots

to develop an ideal collision avoidance technique. An experimental

episode indicates a stage on the real-world circuit and hence a whole

race from the beginning to the end. However, the race ﬁnishes

abruptly in other cases, like when one agent turns back or leaves the

track edge owing to the accident. As a leading agent, a vehicle sends

its learned model parameters to its following agents within one local

network while the following agents sit inactively and are awaiting

the learned by their leading agent. As we developed an autonomous

non-deterministic driving environment, we performed 10 experiments

and calculated the average results for all runs.

Vehicle 1 BlockageVehicle 2Vehicle 3Vehicle 4

x = 0x

Figure 4: Presentation of Rewards Function Regarding The

Conditions of Multiple Vehicle Collision by Sudden Slow-

down.

IV. RES ULT & DISCUSSION

The agents learn the optimal driving comportment during simula-

tions for mastering the avoidance of collision by using the proposed

driving scheme in Unity3D. The objective of the agent is to improve

its conduct dynamically by learning to prevent collision with other

agents and things near them. In the meantime, their time of arrival

is also minimised. An autonomous vehicles system must scale

efﬁciently as there are ﬂuctuations in the number of participants.

We thus test the system’s scalability in the different densities of

participating AVs. To assess driving performance, we employed two

Conventional Vehicle (CVs), and one Autonomous Vehicles (AVs)

installed with DDPG. The AVs agents performance is evaluated by

considering mainly the number of collisions suffered by the agents

and the rewards achieved during tests presented in Figure: 5 and

Figure: 6. The ﬁrst cooperative and the second one are competitive

in results, which indicate that the driving scheme includes two CVs

with one and two AVs working with DDPG. In Figure: 6, the average

number of collisions over a span of 500 episodes during the training

process is presented. Average collisions during the training phase are

given in Figure: 6 for the episodes. In this ﬁgure, every point of

collision is measured by adding up to every 50 episodes. From the

data, we note that the average number of collisions in all scenes

is decreased as the number of episodes rise. The average number

of competitive collisions is approximately 211% higher compared

to the cooperative strategy. The reason is that in the system, the

distribution parameter approach of the DDPG is used to increase AVs

performance. The DDPG rewards for nearby autonomous vehicles

in a communication network compare us to each vehicle’s optimal

driving policy. The highest reward is the best conduct of autonomous

vehicles. Furthermore, we see that DDPG’s competitive technique

requires more training time to reach zero collision in Figure: 6. The

highest reward represents the best driving behaviours of autonomous

vehicles. Additionally, we also notice that a longer training time

is needed by the competitive approach of DDPG to achieve zero

collision in Figure: 6. Compared to DDPG’s cooperative approach,

the zero collision objective in the training phases is 13%faster.

Figure 5: For Each Approach, The Average Episode Rewards

Against Time Step.

Figure 6: For Each Approach, The Number of Collisions Per

Episode.

The cooperative rewards gain by autonomous driving vehicles are

measured, and the results are shown in Figure: 5. The results show

the sum of the reward recorded for the entire time steps of training.

The average reward increases for the safety scheme as the number of

episodes grows. At ﬁrst, the reward is low in all situations; especially

since the cooperative approach of each autonomous vehicles has

been initiated with random learning parameters in the initial stage,

the competitive approach has been very low in a few periods. The

vehicles can therefore not choose the right action for their following

move, leading to chain crashes. The cooperative strategy penalizes

unsuccessful acts by reducing rewards, allowing DDPG to learn from

its mistakes. Then, depending on its previous experiences, it may

choose the correct action in the future episode. As the agent’s learning

experiences improve with each episode, the rewards begin to rise. The

reason is because the cars learnt to take an appropriate action to pre-

vent collisions. In this situation, rewards are given to encourage and

limit the desired driving behaviour. We note that DDPG’s competitive

technique has the lowest reward compared to this cooperative strategy

over a period of 500 episodes. Approximately 33% greater than the

average competitive DDPG Driving Vehicle reward for cooperation

is paid by autonomous driving vehicles. In addition, when a collision

is possible, the DDPG algorithm additionally awards penalties when

the agents move too close together. Interestingly enough, the agents

are not colliding with other vehicles because they acquired better

policies.

Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.

V. CONCLUSIONS

We analysed in depth the needs and the design objectives of

multiple autonomous vehicles in this paper. In order to achieve

such objectives, we have presented an effective and safety planning

scheme. We have devised an efﬁcient collision prevention technique

using a multi-actor hierarchy. The DDPG’s rewards are deﬁned by

taking into account collision avoidance in the ﬁeld of cooperative

and competitive approaches, minimising time of arrival, and road

maintenance. Its worth indicates the superiority of driving action;

for example, a superior reward indicates optimum safety driving

behaviour. To boost up the speed of learning process, the parameter

distribution approach is used. Using a privately-held communica-

tion technique with a follower-leader multi-agent manner. We have

demonstrated that the safety scheme efﬁciently reduces the count

of collisions and scales with an increasing number of autonomous

vehicles through rigorous testing at Unity3d. We also exposed metic-

ulous safety learning, in which vehicles learnt not to collide with

other vehicles but to go off the lane and stop. This enhanced driving

behaviour while lowering danger and liability.As part of our future

work we intend to improve safety by the incorporation of camera-

based images or videos of the environment into the learning process.

In addition, to adapt it to the situations where roads have no necessity

for speed range, we contemplate enhancing the safety scheme.

VI. ACKNOWLEDGMENT

The authors would like to thank Universiti Malaysia Pa-

hang for laboratory facilities as well as additional ﬁnancial sup-

port under Internal Research grant RDU192202, further thanks

to the Ministry of Higher Education of Malaysia for provid-

ing ﬁnancial support under Fundamental Research Grant Scheme

(FRGS) No.FRGS/1/2018/TK08/UMP/02/2 (University reference

RDU190137).

REFERENCES

[1] T. B. Sarwar, N. M. Noor, M. S. U. Miah, M. Rashid, F. Al Farid,

and M. N. Husen, “Recommending research articles: A multi-level

chronological learning-based approach using unsupervised keyphrase

extraction and lexical similarity calculation,” IEEE Access, 2021.

[2] S. F. Kamarulzaman and S. Yasunobu, “Cooperative multi-knowledge

learning control system for obstacle consideration,” in International

Conference on Information Processing and Management of Uncertainty

in Knowledge-Based Systems. Springer, 2014, pp. 506–515.

[3] A. Karim, M. A. Islam, P. Mishra, A. J. M. Muzahid, A. Yousuf,

M. M. R. Khan, and C. K. M. Faizal, “Yeast and bacteria co-culture-

based lipid production through bioremediation of palm oil mill efﬂuent:

a statistical optimization,” Biomass Conversion and Bioreﬁnery, pp. 1–

12, 2021.

[4] S. A. Murad, Z. R. M. Azmi, Z. H. Hakami, N. J. Prottasha, and

M. Kowsher, “Computer-aided system for extending the performance

of diabetes analysis and prediction,” in 2021 International Conference

on Software Engineering & Computer Systems and 4th International

Conference on Computational Science and Information Management

(ICSECS-ICOCSIM). IEEE, 2021, pp. 465–470.

[5] M. Kowsher, A. Tahabilder, and S. A. Murad, “Impact-learning: a robust

machine learning algorithm,” in Proceedings of the 8th international

conference on computer and communications management, 2020, pp.

9–13.

[6] M. A. Rahim, M. A. Rahman, M. M. Rahman, A. T. Asyhari, M. Z. A.

Bhuiyan, and D. Ramasamy, “Evolution of iot-enabled connectivity and

applications in automotive industry: A review,” Vehicular Communica-

tions, vol. 27, p. 100285, 2021.

[7] S. F. Kamarulzaman and H. Al Sibai, “Compound learning control

for formation management of multiple autonomous agents.” Pertanika

Journal of Science & Technology, vol. 25, 2017.

[8] A. Widyotriatmo and K.-S. Hong, “Navigation function-based control of

multiple wheeled vehicles,” IEEE Transactions on Industrial Electronics,

vol. 58, no. 5, pp. 1896–1906, 2011.

[9] M. H. Alsibai, S. F. Kamarulzaman, H. A. Alfarra, and Y. H. Naif, “Real

time emergency auto parking system in driver lethargic state for accident

preventing,” in MATEC Web of Conferences, vol. 90. EDP Sciences,

2017, p. 01034.

[10] M. A. Rahim, M. A. Rahman, M. M. Rahman, A. T. Asyhari, M. Z. A.

Bhuiyan, and D. Ramasamy, “Evolution of iot-enabled connectivity and

applications in automotive industry: A review,” Vehicular Communica-

tions, vol. 27, p. 100285, 2021.

[11] Y. S. Chi and S. F. Kamarulzaman, “Intelligent gender recognition

system for classiﬁcation of gender in malaysian demographic,” in

InECCE2019: Proceedings of the 5th International Conference on Elec-

trical, Control & Computer Engineering, Kuantan, Pahang, Malaysia,

29th July 2019, vol. 632. Springer Nature, 2020, p. 283.

[12] K. Jolly, R. S. Kumar, and R. Vijayakumar, “A bezier curve based path

planning in a multi-agent robot soccer system without violating the

acceleration limits,” Robotics and Autonomous Systems, vol. 57, no. 1,

pp. 23–33, 2009.

[13] W. S. Cheong, S. F. Kamarulzaman, and M. A. Rahman, “Implementa-

tion of robot operating system in smart garbage bin robot with obstacle

avoidance system,” in 2020 Emerging Technology in Computing, Com-

munication and Electronics (ETCCE). IEEE, 2020, pp. 1–6.

[14] K. P. Cheng, R. E. Mohan, N. H. K. Nhan, and A. V. Le, “Multi-

objective genetic algorithm-based autonomous path planning for hinged-

tetro reconﬁgurable tiling robot,” IEEE Access, vol. 8, pp. 121267–

121 284, 2020.

[15] B. Li, Y. Ouyang, Y. Zhang, T. Acarman, Q. Kong, and Z. Shao, “Opti-

mal cooperative maneuver planning for multiple nonholonomic robots in

a tiny environment via adaptive-scaling constrained optimization,” IEEE

Robotics and Automation Letters, vol. 6, no. 2, pp. 1511–1518, 2021.

[16] L. C. Kiew, A. J. M. Muzahid, and S. F. Kamarulzaman, “Vehicle route

tracking system based on vehicle registration number recognition using

template matching algorithm,” in 2021 International Conference on

Software Engineering Computer Systems and 4th International Confer-

ence on Computational Science and Information Management (ICSECS-

ICOCSIM), 2021, pp. 249–254.

[17] A. J. M. Muzahid, S. F. Kamarulzaman, and M. A. Rahman, “Compar-

ison of ppo and sac algorithms towards decision making strategies for

collision avoidance among multiple autonomous vehicles,” in 2021 In-

ternational Conference on Software Engineering Computer Systems and

4th International Conference on Computational Science and Information

Management (ICSECS-ICOCSIM), 2021, pp. 200–205.

[18] L. Shangguan, J. A. Thomasson, and S. Gopalswamy, “Motion planning

for autonomous grain carts,” IEEE Transactions on Vehicular Technol-

ogy, vol. 70, no. 3, pp. 2112–2123, 2021.

[19] M. A. Rahim, M. Rahman, M. A. Rahman, A. J. M. Muzahid, and S. F.

Kamarulzaman, “A framework of iot-enabled vehicular noise intensity

monitoring system for smart city,” Advances in Robotics, Automation

and Data Analytics: Selected Papers from ICITES 2020, vol. 1350, p.

194, 2021.

[20] M. Fu, H. G. Franquelim, S. Kretschmer, and P. Schwille, “Non-

equilibrium large-scale membrane transformations driven by minde

biochemical reaction cycles,” Angewandte Chemie International Edition,

vol. 60, no. 12, pp. 6496–6502, 2021.

[21] M. S. I. Shoﬁqul, N. Ab Ghani, and M. M. Ahmed, “A review on

recent advances in deep learning for sentiment analysis: Performances,

challenges and limitations,” 2020.

[22] S. F. Kamarulzaman and M. H. Alsibai, “Time-change-fuzzy-based

intelligent vehicle control system for safe emergency lane transition

during driver lethargic state,” Advanced Science Letters, vol. 24, no. 10,

pp. 7554–7558, 2018.

[23] J. Odili, M. N. M. Kahar, A. Noraziah, and S. F. Kamarulzaman, “A

comparative evaluation of swarm intelligence techniques for solving

combinatorial optimization problems,” International Journal of Ad-

vanced Robotic Systems, vol. 14, no. 3, p. 1729881417705969, 2017.

[24] A. J. M. Muzahid, S. F. Kamarulzaman, and M. A. Rahim, “Learning-

based conceptual framework for threat assessment of multiple vehicle

collision in autonomous driving,” in 2020 Emerging Technology in

Computing, Communication and Electronics (ETCCE). IEEE, 2020,

pp. 1–6.

[25] M. Rashid, M. Islam, N. Sulaiman, B. S. Bari, R. K. Saha, and M. J.

Hasan, “Electrocorticography based motor imagery movements classi-

ﬁcation using long short-term memory (lstm) based on deep learning

approach,” SN Applied Sciences, vol. 2, no. 2, pp. 1–7, 2020.

Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.

[26] M. Miah, J. Sulaiman, T. B. Sarwar, K. Z. Zamli, and R. Jose, “Study of

keyword extraction techniques for electric double-layer capacitor domain

using text similarity indexes: An experimental analysis,” Complexity, vol.

2021, 2021.

[27] M. M. Hasan, M. S. Islam, and S. Abdullah, “Robust pose-based

human fall detection using recurrent neural network,” in 2019 IEEE

International Conference on Robotics, Automation, Artiﬁcial-intelligence

and Internet-of-Things (RAAICON). IEEE, 2019, pp. 48–51.

[28] N. Sugiyama and T. Nagatani, “Multiple-vehicle collision induced by

a sudden stop in trafﬁc ﬂow,” Physics Letters A, vol. 376, no. 22, pp.

1803–1806, 2012.

[29] P. Long, T. Fan, X. Liao, W. Liu, H. Zhang, and J. Pan, “Towards

optimally decentralized multi-robot collision avoidance via deep rein-

forcement learning,” in 2018 IEEE International Conference on Robotics

and Automation (ICRA). IEEE, 2018, pp. 6252–6259.

Authorized licensed use limited to: Universiti Malaysia Pahang. Downloaded on January 31,2022 at 01:07:10 UTC from IEEE Xplore. Restrictions apply.

The Development and Deployment of an Online Exam System: A Web Application

Article

Jun 2023

The rapid advancements in computer technology and the internet’s acceptance in every aspect of our lives, particularly in recent years, have made students and instructors vital in the teaching and learning sector. Web-based studies have also brought about advances in the education area, and numerous applications have become widespread in this field. In this paper, we suggested an online test multiple-choice question assessment system for students called the Online Exam System (OES). This system may be used by any university, college, or institution that has a computerized education system. The OES can be used by teachers to administer quizzes. The system will calculate the participant’s performance based on his response, and the following question will be created based on the participant’s performance. After the examination, the system will display the results and offer feedback based on the participant’s request. Administrative control over the entire system is available. A teacher has authority over the question bank and is responsible for creating test schedules. Therefore, the project will be very helpful for the beginner and mid-level programming learners. And also, will give a proper guideline to the students who are willing to learn programming and introduce the users with competitive programming and problem-solving skills.

REINFORCEMENT LEARNING BASED DECISION-MAKING MODEL IN AUTONOMOUS VEHICLE CONTROL FOR COOPERATION AND MITIGATION OF COLLISION AMONG MULTIPLE VEHICLES

Thesis

Full-text available

Sep 2022

Abu Jafar Md Muzahid

Self-driving cars have become a popular research topic in recent years. Autonomous driving is a complicated field of study that involves a variety of disciplines, such as electronics, computer vision, geo-location, decision-making, or control. Autonomous vehicles are an example of non-linear technologies being used in the real world. Controlling this kind of device in particular situations in the context of multi-agent traffic systems is difficult because of instability. This type of equipment demands expertise, and it is even more difficult to create this understanding of talent as an independent control system. Because each agent has its own self-determined protocol decision management, it is hard to coordinate several autonomous devices on a single job. Over the last decade, there has been a lot of attention on sequential decision-making under ambiguity and uncertainty, which is a distinct range of challenges requiring an agent to interact with an uncertain environment to achieve a target. Reinforcement learning methods applied to these challenges have resulted in recent AI achievements in robotics, game playing, and other areas. In response to these empirical testimonies, this project confronts the problem of multiple vehicle control decisions and performs control strategies for the avoidance of severe multiple vehicle collisions in autonomous vehicles. These control techniques rely on the reinforcement learning model and deploy two distinct traffic scenarios for progressing research flow. An extensive taxonomy conveyed the existing protocols and solutions, and a conceptual model for MVCCA was formulated first. Then, using the Reinforcement Learning-based Decision-Making (RLDM) model, the system is developed and implemented. An extensive simulation gives us the best outcomes for the development of optimum driving strategies in a multi-agent traffic environment. We extensively evaluate the training performance, driving performance, and the ability of collision avoidance as well. We investigated the training performance of both the single vehicle and multiple vehicle environments. Validation of the decision-making scheme would create new opportunities for autonomous driving, as well as new concepts and applications.

Safe Decision Controller for Autonomous DrivingBased on Deep Reinforcement Learning inNondeterministic Environment

Article

Full-text available

Jan 2023
SENSORS-BASEL

Autonomous driving systems are crucial complicated cyber–physical systems that combine physical environment awareness with cognitive computing. Deep reinforcement learning is currently commonly used in the decision-making of such systems. However, black-box-based deep reinforcement learning systems do not guarantee system safety and the interpretability of the reward-function settings in the face of complex environments and the influence of uncontrolled uncertainties. Therefore, a formal security reinforcement learning method is proposed. First, we propose an environmental modeling approach based on the influence of nondeterministic environmental factors, which enables the precise quantification of environmental issues. Second, we use the environment model to formalize the reward machine’s structure, which is used to guide the reward-function setting in reinforcement learning. Third, we generate a control barrier function to ensure a safer state behavior policy for reinforcement learning. Finally, we verify the method’s effectiveness in intelligent driving using overtaking and lane-changing scenarios.

Multiple vehicle cooperation and collision avoidance in automated vehicles: survey and an AI-enabled conceptual framework

Article

Full-text available

Jan 2023

Prospective customers are becoming more concerned about safety and comfort as the automobile industry swings toward automated vehicles (AVs). A comprehensive evaluation of recent AVs collision data indicates that modern automated driving systems are prone to rear-end collisions, usually leading to multiple-vehicle collisions. Moreover, most investigations into severe traffic conditions are confined to single-vehicle collisions. This work reviewed diverse techniques of existing literature to provide planning procedures for multiple vehicle cooperation and collision avoidance (MVCCA) strategies in AVs while also considering their performance and social impact viewpoints. Firstly, we investigate and tabulate the existing MVCCA techniques associated with single-vehicle collision avoidance perspectives. Then, current achievements are extensively evaluated, challenges and flows are identified, and remedies are intelligently formed to exploit a taxonomy. This paper also aims to give readers an AI-enabled conceptual framework and a decision-making model with a concrete structure of the training network settings to bridge the gaps between current investigations. These findings are intended to shed insight into the benefits of the greater efficiency of AVs set-up for academics and policymakers. Lastly, the open research issues discussed in this survey will pave the way for the actual implementation of driverless automated traffic systems.

GLAD: Grounded Layered Autonomous Driving for Complex Service Tasks

Preprint

Oct 2022

Given the current point-to-point navigation capabilities of autonomous vehicles, researchers are looking into complex service requests that require the vehicles to visit multiple points of interest. In this paper, we develop a layered planning framework, called GLAD, for complex service requests in autonomous urban driving. There are three layers for service-level, behavior-level, and motion-level planning. The layered framework is unique in its tight coupling, where the different layers communicate user preferences, safety estimates, and motion costs for system optimization. GLAD is visually grounded by perceptual learning from a dataset of 13.8k instances collected from driving behaviors. GLAD enables autonomous vehicles to efficiently and safely fulfill complex service requests. Experimental results from abstract and full simulation show that our system outperforms a few competitive baselines from the literature.

Multiple Vehicle Cooperation and Collision Avoidance in Automated Vehicles: Survey and an AI-Enabled Conceptual Framework

Preprint

Full-text available

May 2022

Prospective customers are becoming more concerned about safety and comfort as the automobile industry swings toward Automated Vehicles (AVs). A comprehensive evaluation of recent AVs collision data indicates that modern automated driving systems are prone to rear-end collisions, usually leading to multiple vehicle collisions. Moreover, most investigations into severe traffic conditions are confined to single-vehicle collisions. This work reviewed diverse techniques of existing literature to provide planning procedures for Multiple Vehicle Cooperation and Collision Avoidance (MVCCA) strategies in AVs while also considering their performance and social impact viewpoints. Firstly, we investigate and tabulate the existing MVCCA techniques associated with single-vehicle collision avoidance perspectives. Then, current achievements are extensively evaluated, challenges and flows are identified, and remedies are intelligently formed to exploit a taxonomy. This paper also aims to give readers a AI-enable conceptual framework, a decision-making model with a concrete structure of the training network settings to bridge the gaps between current investigations. These findings are intended to shed insight on the benefits of the greater efficiency of AVs set-up for academics and policymakers. Finally, the open research issues discussed in this article will pave the way for the actual implementation of driver-less automated traffic systems.

Deep Reinforcement Learning-Based Driving Strategy for Avoidance of Chain Collisions and Its Safety Efficiency Analysis in Autonomous Vehicles

Article

Full-text available

Jan 2022

Vehicle control in autonomous traffic flow is often handled using the best decision-making reinforcement learning methods. However, unexpected critical situations make the collisions more severe and, consequently, the chain collisions. In this work, we first review the leading causes of chain collisions and their subsequent chain events, which might provide an indication of how to prevent and mitigate the crash severity of chain collisions. Then, we consider the problem of chain collision avoidance as a Markov Decision Process problem in order to propose a reinforcement learning-based decision-making strategy and analyse the safety efficiency of existing methods in driving security. To address this, A reward function is being developed to deal with the challenge of multiple vehicle collision avoidance. A perception network structure based on formation and on actor-critic methodologies is employed to enhance the decision-making process. Finally, in the safety efficiency analysis phase, we investigated the safety efficiency performance of the agent vehicle in both single-agent and multi-agent autonomous driving environments. Three state-of-the-art contemporary actor-critic algorithms are used to create an extensive simulation in Unity3D. Moreover, to demonstrate the accuracy of the safety efficiency analysis, multiple training runs of the neural networks in respect of training performance, speed of training, success rate, and stability of rewards with a trade-off between exploitation and exploration during training are presented. Two aspects (single-agent and multi-agent) have assessed the efficiency of algorithms. Every aspect has been analyzed regarding the traffic flows: (1) the controlling efficiency of unexpected traffic situations by the sudden slowdown, (2) abrupt lane change, and (3) smoothly reaching the destination. All the findings of the analysis are intended to shed insight on the benefits of a greater, more reliable autonomous traffic set-up for academics and policymakers, and also to pave the way for the actual carry-out of a driver-less traffic world.

Chain Collision Avoidance Using Vehicle-to-Everything (V2X) Communication

Conference Paper

Oct 2023

Study of Keyword Extraction Techniques for Electric Double-Layer Capacitor Domain Using Text Similarity Indexes: An Experimental Analysis

Article

Full-text available

Dec 2021
COMPLEXITY

Keywords perform a significant role in selecting various topic-related documents quite easily. Topics or keywords assigned by humans or experts provide accurate information. However, this practice is quite expensive in terms of resources and time management. Hence, it is more satisfying to utilize automated keyword extraction techniques. Nevertheless, before beginning the automated process, it is necessary to check and confirm how similar expert-provided and algorithm-generated keywords are. This paper presents an experimental analysis of similarity scores of keywords generated by different supervised and unsupervised automated keyword extraction algorithms with expert-provided keywords from the electric double layer capacitor (EDLC) domain. The paper also analyses which texts provide better keywords such as positive sentences or all sentences of the document. From the unsupervised algorithms, YAKE, TopicRank, MultipartiteRank, and KPMiner are employed for keyword extraction. From the supervised algorithms, KEA and WINGNUS are employed for keyword extraction. To assess the similarity of the extracted keywords with expert-provided keywords, Jaccard, Cosine, and Cosine with word vector similarity indexes are employed in this study. The experiment shows that the MultipartiteRank keyword extraction technique measured with cosine with word vector similarity index produces the best result with 92% similarity with expert-provided keywords. This study can help the NLP researchers working with the EDLC domain or recommender systems to select more suitable keyword extraction and similarity index calculation techniques.

Recommending Research Articles: A Multi-Level Chronological Learning-Based Approach Using Unsupervised Keyphrase Extraction and Lexical Similarity Calculation

Article

Full-text available

Nov 2021

A research article recommendation approach aims to recommend appropriate research articles to analogous researchers to help them better grasp a new topic in a particular research area. Due to the accessibility of research articles on the web, it is tedious to recommend a relevant article to a researcher who strives to understand a particular article. Most of the existing approaches for recommending research articles are metadata-based, citation-based, bibliographic coupling-based, content-based, and collaborative filtering-based. They require a large amount of data and do not recommend reference articles to the researcher who wants to understand a particular article going through the reference articles of that particular article. Therefore, an approach that can recommend reference articles for a given article is needed. In this paper, a new multi-level chronological learning-based approach is proposed for recommending research articles to understand the topics/concepts of an article in detail. The proposed method utilizes the TeKET keyphrase extraction technique, among other unsupervised techniques, which performs better in extracting keyphrases from the articles. Cosine and Jaccard similarity measures are employed to calculate the similarity between the parent article and its reference articles using the extracted keyphrases. The cosine similarity measure outperforms the Jaccard similarity measure for finding and recommending relevant articles to understand a particular article. The performance of the recommendation approach seems satisfactory, with an NDCG value of 0.87. The proposed approach can play an essential role alongside other existing approaches to recommend research articles.

Computer-aided system for extending the performance of diabetes analysis and prediction

Conference Paper

Full-text available

Aug 2021

Implementation of Robot Operating System in Smart Garbage Bin Robot with Obstacle Avoidance System

Conference Paper

Full-text available

Dec 2020

Comparison of PPO and SAC Algorithms Towards Decision Making Strategies for Collision Avoidance Among Multiple Autonomous Vehicles

Conference Paper

Aug 2021

Vehicle Route Tracking System based on Vehicle Registration Number Recognition using Template Matching Algorithm

Conference Paper

Aug 2021

A Framework of IoT-Enabled Vehicular Noise Intensity Monitoring System for Smart City

Chapter

Mar 2021

The noise, sound pollution, and harshness are steadily increasing alarmingly from various vehicles such as lorries, vans, cars, and buses. It is considered a significant concern in our modern life due to the long-term harmful effect on human health. The excessive vehicular noise and sound pollution mainly affecting in specific areas of smart cities such as hospitals, educational institutes, various private/public organizations. As a result, people are suffering from neurocognitive problems. Therefore, controlling the sound pollution from vehicles are incredibly essential to mitigate health issues globally. However, minimal research was conducted by monitoring vehicular noise intensity in a smart city area using smart, reliable, and sophisticated technologies such as the internet of things (IoT) to mitigate these issues. This paper presents an enhanced intelligent IoT-enabled vehicular noise intensity monitoring system to protect the city dweller’s health by reducing sound pollution in the smart city. We are planned to propose an architectural framework using noise intensity measuring sensors, 360° Camera, 360° LIDAR, GPS (Global Positioning System) in the vehicle with IoT technology for monitoring individual vehicles and their nearby vehicles over the smart city. The proposed system can monitor real-timely vehicular noises and notify designated stakeholders (i.e., vehicle owner, city authority) promptly and stored in the cloud with valid proofs. The proposed system helps to develop an ideal vehicular noise monitoring system over the smart cities and the widespread significance of mitigating health problems that relate to sound pollution and increase public awareness.

Learning-Based Conceptual framework for Threat Assessment of Multiple Vehicle Collision in Autonomous Driving

Conference Paper

Feb 2021

The autonomous driving is increasingly mounting, promoting, and promising the future of fully autonomous and, correspondingly presenting new challenges in the field of safety assurance. The unexpected and sudden lane change are extremely serious causes of traffic accident and, such an accident scheme leads the multiple vehicle collisions. Extensive evaluation of recent crash data we found a crucial indication that autonomous driving systems are most prone to rear-end collision, which is the leading factor of chain crash. Learning based self-developing assessment assists the operators in providing the necessary prediction operations or even replace them. Here we proposed a Reinforcement learning-based conceptual framework for threat assessment system and scrutinize critical situations that leads to multiple vehicle collisions in autonomous driving. This paper will encourage our transport community to rethink the existing autonomous driving models and reach out to other disciplines, particularly robotics and machine learning, to join forces to create a secure and effective system.

Motion Planning for Autonomous Grain Carts

Article

Feb 2021

When harvesting grain crops on large farms, a combine collects the grain while a grain cart transports the grain by commuting between the combine and a semi-trailer parked by the roadside. There are several issues associated with human-operated grain carts: labor shortage and increasing labor cost, operational imprecision and inefficiency as well as safety hazards, all of which can potentially be addressed if grain carts were autonomous. This paper presents a motion planning algorithm and the associated navigation solution for autonomous grain carts. The algorithm features a novel integration of Artificial Potential Field (APF) with Fuzzy Logic Control (FLC). A set of simulation tests were carried out, comparing the proposed APF+FLC planner with a simple APF planner. The test results verified the effectiveness, robustness, and efficiency of the proposed planning algorithm in performing the logistical tasks in harvest operations where unharvested crops were the only obstacles as well as when random static or dynamic obstacles existed. In addition, a set of mobile robot tests implementing the proposed navigation solution were conducted, in which the robot representing the grain cart autonomously accomplished the logistical tasks in the harvest operations, verifying the effectiveness and practicality of the navigation solution.

Optimal Cooperative Maneuver Planning for Multiple Nonholonomic Robots in a Tiny Environment via Adaptive-Scaling Constrained Optimization

Article

Feb 2021

This paper is focused on the time-optimal Multi-Vehicle Trajectory Planning (MVTP) problem for multiple car-like robots when they travel in a tiny indoor scenario occupied by static obstacles. Herein, the complexity of the concerned MVTP task includes i) the non-convexity and narrowness of the environment, ii) the nonholonomy and nonlinearity of the vehicle kinematics, iii) the pursuit for a time-optimal solution, and iv) the absence of predefined homotopic routes for the vehicles. The aforementioned factors, when mixed together, are beyond the capability of the prevalent coupled or decoupled MVTP methods. This work proposes an adaptive-scaling constrained optimization (ASCO) approach, aiming to find the optimum of the nominally intractable MVTP problem in a decoupled way. Concretely, an iterative computation framework is built, wherein each intermediate subproblem contains only risky collision avoidance constraints within a certain range, thus being tractable in the scale. During the iteration, the constraint activation scale can change adaptively, thereby enabling to promote the convergence rate, to recover from an intermediate failure, and to get rid of a poor initial guess. ASCO is extensively compared versus the state-of-the-art MVTP methods in challenging simulation cases and is validated in real experiments conducted by a team of three car-like robots.

Optimal Safety Planning and Driving Decision-Making for Multiple Autonomous Vehicles: A Learning Based Approach

Figures

Recommended publications

Multi-Agent Imitation Learning for Driving Simulation

Comparative Study on Job Scheduling Using Priority Rule and Machine Learning

Comparison of PPO and SAC Algorithms Towards Decision Making Strategies for Collision Avoidance Amon...

A Conceptual Anonymity Model to Ensure Privacy for Sensitive Network Data

A Conceptual Anonymity Model to Ensure Privacy for Sensitive Network Data