Conference PaperPDF Available

Behavior Cloning for Autonomous Driving using Convolutional Neural Networks

Authors:
Behavior Cloning for Autonomous Driving using
Convolutional Neural Networks
Wael Farag1,2,4, Zakaria Saleh3,5
1American University of the Middle East (AUM), Kuwait.
2Electrical Eng., Cairo University, Egypt
3University of Bahrain, Bahrain.
4wael.farag@aum.edu.kw, 5zsaleh@ubo.edu.bh
Abstract—In this paper, we propose using a Convolutional
Neural Network (CNN) to learn safe driving behavior and
smooth steering maneuvering as an empowerment of
autonomous driving technologies. The training data is collected
from a front-facing camera and the steering commands issued
by an experienced driver driving in traffic as well as urban
roads. This data is then used to train the proposed CNN to
facilitate what we call it behavioral cloning. The proposed
Behavior Cloning CNN is named as “BCNet” and its deep
seventeen-layer architecture has been selected after extensive
trials. The BCNet got trained using Adam’s optimization
algorithm as a variant of the Scholastic Gradient Descent (SGD)
technique. The paper goes through the development and
training process in details and shows the image processing
pipeline harnessed in the development. The proposed approach
proved successful in cloning the driving behavior embedded in
the training data set after extensive simulations.
Keywords—Behavioral Cloning, Convolutional Neural
Network, Autonomous Driving, Machine Learning
I. INTRODUCTION
In the past decade, the automobile industry has made a
shift towards intelligent vehicles equipped with driving
assistance systems [1-2] and recently has introduced vision
systems in their high-end cars. The vision system (the
mounted cameras in the car including the front-facing ones) is
being utilized by autonomous driving engineers to develop
many of the future self-driving cars features like: a) road-lane
finding; b) free driving-space finding; c) traffic signs detection
and recognition [3-7]; d) traffic lights detection and
recognition; e) road-objects detection and tracking. In this
paper, we propose to use the mounted car vision system (more
specifically, the front-facing camera) to improve the safety
and the driving behavior of future self-driving cars.
The main idea is to construct a Convolutional Neural
Network (CNN) that is able to learn the safe driving
maneuvers from data collected through the driving of an
expert driver in urban roads. The main focus in this paper to
let the proposed CNN map raw pixels from a single front-
facing camera directly to steering commands of the car. This
an end-to-end approach that lets the car drives without lane
markings on highways and inroads with unclear visual
guidance such as in parking lots and on unpaved roads [8]. The
CNN automatically learns internal representations of the
necessary processing pipeline steps such as detecting useful
road features with only the human steering angle as the
training signal. In comparison with the explicit decomposition
of the autonomous driving problem into lane-marking
detection, path planning, and control, the proposed end-to-end
CNN optimizes all processing steps simultaneously.
II. THE CNN ARCHITECTURE
The proposed CNN architecture is a seventeen-layer
Behavior Cloning CNN model is given the name “BCNet”.
The model is coded using Keras [8] on top of Tensorflow [10]
in Python [11]. Fig. 1 illustrates the BCNet architecture as
well as Table I below describes the architecture in details:
BCNET ARCHITECTURE.
# Layer Size/Output Parameters Comment
1 Input 160x320x3 ------------ Color 3
Channels RGB
2 Normalization 160x320x3 lambda x:
x/127.5 - 1
Scaling the
inputs => -1 &
1, using Keras
Lambda
function
3 Cropping 65x320x3 =
62,400
The new
height will be
(160-70) - 25
= 65
Cropping the
images, cut 70
pixels from the
top and 25
pixels from the
bottom.
4 Convolutional
#1 31x159x24
No. Filters
=24,
Kernel =5x5,
Strides = 2x2,
Padding:
Valid,
Activation =
RELU
No Pooling
[(W−F+2P)/S]+1
5 Convolutional
#2 14x78x36
No. Filters
=36,
Kernel =5x5,
Strides = 2x2,
Padding:
Valid,
Activation =
RELU
No Pooling
[(W−F+2P)/S]+1
6 Convolutional
#3
5x38x48
No. Filters
=48,
Kernel =5x5,
Strides = 2x2,
Padding:
Valid,
Activation =
RELU
No Pooling
[(W−F+2P)/S]+1
7 Convolutional
#4
3x36x64
No. Filters
=64,
Kernel =3x3,
Strides = 1x1,
Padding:
Valid,
No Pooling
[(W−F+2P)/S]+1
2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)
978-1-5386-9207-3/18/$31.00 ©2018 IEEE
Activation =
RELU
8 Convolutional
#5 1x34x64
No. Filters
=64,
Kernel =3x3,
Strides = 1x1,
Padding:
Valid,
Activation =
RELU
No Pooling
[(W−F+2P)/S]+1
9 Flatten 1x34x64 =
2,176
Keras Flatten
Function
10 Drop-out 2,176
Keep
Probability:
0.5 => 0.7
11 Fully
Connected #1 200 Keras Dense
Layer With biases
12 Drop-out 200
Keep
Probability:
0.5 => 0.7
13 Fully
Connected #2 100 Keras Dense
Layer With biases
14 Drop-out 100
Keep
Probability:
0.5 => 0.7
15 Fully
Connected #3 20 Keras Dense
Layer With biases
16 Drop-out 20
Keep
Probability:
0.5 => 0.7
17 Fully
Connected #4 1 Keras Dense
Layer Output layer
Four Drop-out layers are added to prevent over-fitting
during training, and the fully connected layers are widened.
Also, No Pooling layers are used here, as it is a regression
problem, not a classification. Additionally, all the
convolutional layers are sized according to the input image
sizes after normalization and cropping.
III. THE TRAINING DATA SET
The following are the two main sources of data which are
utilized to construct the training data set that is used to train
the BCNet:
1) Source 1 - Udacity Supplied Data [12]: These collections
with an unzipped size of 365MB consists of 24,108 images
equally divided between center, left and right front
cameras shots. Each image is 160x320 pixels size with 3
channels for RGB colors. The index of the data is stored in
a CSV file which contains 8,036 line of records.
2) Source 2 - Simulator Generated Data: collected using the
open source Udacity driving simulator in [13]. The
recorded data set has an unzipped size of 808MB and
consists of 49,851 images equally divided between center,
left and right front cameras shots. Each image is 160x320
pixels size with 3 channels for RGB colors. The index of
the data is stored in a CSV file which contains 16,617 line
of records. The data has been generated by driving the car
manually around Track 1 in the mentioned simulator
several times (~ 10 times) with as good as possible safe
driving behavior. Particularly, it is encouraged to include
"recovery" data while training. This means that data should
be captured starting from the point of approaching the edge
of the track (perhaps nearly missing a turn and almost
driving off the track) and recording the process of steering
the car back toward the center of the track to give the
model a chance to learn recovery behavior.
Several subroutines have been written for data
visualization and analysis. This acts as a sort of sanity check
to verify that the preprocessing is not fundamentally flawed.
Flawed data will almost certainly act to confuse the model and
result in unacceptable performance. An Example of the output
of these subroutines is presented in Fig. 2 which displays a
sample of the generated training data (source 2), and Fig. 3
which presents the histogram of the steering angle values
collected during driving (source 2).
The data (both source 1 and 2) is divided into 2 separate
parts: training data which represents 80% of the chunk and
validation data which represents 20% of the chunk.
160x320x3
Convolution
#
1
31x159x24
Kernel: 5x5
Strides: 2x2
Cropping
65x320x3
Convolution #2
14x78x36
Kernel: 5x5
Strides: 2x2
Convolution #3
5x38x48
Kernel: 5x5
Strides: 2x2
Convolution #4
3x36x64
Kernel: 3x3
Strides: 1x1
Convolution #5
1x34x64
Kernel: 3x3
Strides: 1x1
Flatten
2176
Drop-out
0.5
Drop-out
0.7
Fully
Connected 200
Drop-out
0.5
Fully
Connected 100
Drop-out
0.7
Fully
Connected 20
Output
Neuron
Fig. 1. The BCNet Architecture.
IV. DRIVING DATA PRE-PROCESSING
Before using the front camera's images in
training/validation data sets, these images need to be pre-
processed to make more useful and convenient throughout the
learning process. The pre-processing steps meant to improve
the training results and reduce the computation as much as
possible. The following steps describe the implemented pre-
processing steps in order of execution:
1) Normalization (color): this is done for color images using
the “Lambda function” in Keras [9] by simply
2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)
implementing a min-max scaling. The values of the RGB
pixels are scaled to the -1 → 1 range and centered on zero
instead of instead of the 0→255 range.
2) Cropping images: The images have been cropped from the
top by 70 pixels and from the bottom by 25 pixels, in order
to focus on the region of interest (ROI) and to reduce the
number of inputs (faster learning process). The cropped
images have the size of 65x320x3.
3) Flipping images: The data has been doubled (augmented)
by flipping all the images (around the y-axis) and reversing
the sign of the corresponding steering angle. Accordingly,
the source-1 data becomes 48,216 samples, and source-2
data becomes 99,702 samples. In other words, each CSV
line record can generate 6 training samples (center, left,
right, flipped-center, flipped-left, and flipped-right).
4) Jittering images: To minimize the model's tendency to
overfit to the conditions of the test track, images are
"jittered" before being fed to the BCNet. The jittering
consists of a randomized brightness adjustment, a
randomized shadow, and a randomized horizon shift. The
shadow effect is simply a darkening of a random
rectangular portion of the image, starting at either the left
or right edge and spanning the height of the image. The
horizon shift applies a perspective transform beginning at
the horizon line (at roughly 2/5 of the height) and shifting
it up or down randomly by up to 1/8 of the image height.
The horizon shift is meant to mimic the topology
conditions of the test track.
5) Data Distribution Flattening: Because the test track
includes long sections with very slight or no curvature, the
data captured from it tends to be heavily skewed toward
low and zero turning angles. This creates a problem for the
neural network, which then becomes biased toward driving
in a straight line and can become easily confused by sharp
turns. The distribution of the input data can be observed in
Fig. 3. To reduce the occurrence of low and zero angle data
points, a histogram of the turning angles is produced and
the average number of samples per bin is computed. Next,
a "keep probability" for the samples belonging to each bin
is determined. That keep probability is 1.0 for bins that
contain less than the computed average samples per bin,
and for other bins, the keep probability is calculated to be
the number of samples for that bin divided by average
samples per bin. Finally, random data points from the data
set are removed with a rate of (1 – “keep probability”). The
resulting data distribution can be seen in Fig. 4. The
distribution is not uniform overall, but it is much closer to
uniform for lower and zero turning angles. This method
helped speed up the training process as lower size data is
used but with higher quality.
6) Cleaning the dataset: it is discovered that the model
performed especially poorly on certain data points, and
then found those data points to be mislabeled in several
cases. A subroutine is created to display frames from the
dataset on which the model performs the worst. The intent
was to manually adjust the steering angles for the
mislabeled frames. Even though this approach is tedious,
it helped to improve the results of the training to some
extent.
7) Shuffling the training data: this is done once each training
epoch to avoid pattern memorization and consequently
trapping in local minima.
8) Using generator function to load data in memory [14]: this
step helps a lot to smooth out the training process, as it is
actually mandatory. Loading the whole data in the
computer memory was actually not possible (or at least not
practical). Each patch (size = 128 images and angles) is
generated and loaded in memory individually. The
fit_generator() function by Keras [14] is used to manage
the whole process.
Fig. 2. A sample of the collected images: center, left and right
respectively.
Finally, the actually used pre-processing pipeline is:
Color-Image → Normalization → Cropping→ Flipping→
Jittering→Shuffling→ Batch Memory Loading.
It is found out that the above pipeline is fair enough and
produced the required results. However, other techniques
were put into consideration; it may be needed or for future
endeavors; as follows:
1) Converting color images to grey: reduces complexity a lot
by reducing the size of training data and the associated
computation to 1/3rd.
2) Incorporating Edge detection and lane finding using
Cunny algorithm and Hough transform.
3) Blur-Filtering the input images using Gaussian filtering to
remove noise.
Fig. 3. The histogram of the steering angles collected during driving.
2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)
Fig. 4. The histogram of the steering angles after flattening.
V. THE LEARNING ALGORITHM
In this work, Adam learning algorithm [15] is used to train
the proposed BCNet and update its weights iteratively based
on the prepared training driving data. Adam is an
optimization algorithm that is used instead of the classical
stochastic gradient descent (SGD) learning algorithm [16].
The method computes individual adaptive learning rates for
different parameters from estimates of first and second
moments of the gradients.
Adaptive Moment Estimation (ADAM) [15] is a method
that computes adaptive learning rates for each parameter. In
addition to storing an exponentially decaying average of past
squared gradients , Adam also keeps an exponentially
decaying average of past gradients, similar to momentum,
and the update equations are given as follows:

(
1
)
(1)

(
1
)
and are estimates of the first moment (the mean) and
the second moment (the uncentered variance) of the gradients
respectively, hence the name of the method. As and are
initialized as vectors of 0’s, the authors of Adam observe that
they are biased towards zero, especially during the initial time
steps, and especially when the decay rates are small (i.e. β1
and β2 are close to 1).
These biases have been counteracted by computing bias-
corrected first and second-moment estimates as follows:
1
(2)
1
The parameters are being updated using the above
estimates which yields the following Adam update rule:
(3)
for all neural network model's parameters θ d (weights
and biases) where is the learning rate, and is a very small
constant prevents divide by zero.
The authors of Adam’s method propose default values of
0.9 for β1, 0.999 for β2, and 10-8 for . They show empirically
that Adam works well in practice and compares favorably to
other adaptive learning-method algorithms [17].
The output layer of the proposed BCNet is a linear
regression function [18]. The network models the system as
a linear combination of features to produce the final estimated
output . The function is given by
 = ℎ(, )=

 +  (4)
where xj is the jth input to the output unit and wj is the jth
weight of the jth input, b is the bias term, N is the number of
inputs to the output unit (or in other words, the size of the
previous hidden layer), w is the vector of weights w = [w0, w1,
wj, … wN-1], x is the vector of inputs x = [x0, x1, … xj, …
xN-1].
The main task of the training process is to find the weights
that provide the best fit for the training data. One way to
measure this fit is to calculate the least squares error (or the
data loss) over the training dataset:
()= (ℎ(, )

 )= (− )

 (5)
Where L is the data loss function that needs to be
minimized using Adam’s algorithm, yi is the ith ground truth
sample output, is the it h output estimate of the neural net,
and M is the number of training samples. Then the gradient
descent will be used on the loss's gradient
w L(w) in order to
minimize the overall error on the training data. Using this, the
weights can be updated using the standard gradient descent:
 = − ∇() (6)
where η is the learning rate.
VI. THE BCNET TRAINING RESULTS
The BCNet model is trained using the parameters listed in
Tables II, III and IV using ADAM’s optimization algorithm.
Fig. 5 shows the setup of the BCNet used during the training
phase, while Fig. 6 shows the setup during the running and
simulation modes. Furthermore, the training results are
presented in Table V, Fig. 7 and Fig. 8. The state of the model
is a bit over-fitting after the training represented by Fig. 8. For
this reason, the learning rate is further reduced and the keep
probability increased.
2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)
Fig. 5. Overview of the CNN in training mode.
Fig. 6. Overview of the trained CNN in running mode.
BCNET TRAINING PARAMETERS (LEARNING RATE).
Algorithm
Parameter
Value
Comment
ADAM
Optimization
Learning
Rate
0.001
For epochs: 0
5 for data
source 1
For epochs: 0 – 5 for data
source 2
0.0005
For epochs:
6
8 for data
source 2
0.0002
For epochs: 8
9 for data
source 2
BCNET TRAINING PARAMETERS (LEARNING RATE).
Parameter
Value
Comment
Left Angle Correction
0.25
Radians
Right Angle Correction
-
0.25
Radians
BCNET TRAINING PARAMETERS (OTHERS).
Parameter
Value
Comment
Batch Size
120
For epochs: 0
-
9
Epoch
s
15
The whole training
Keep Probability
0.5 => 0.7
For Drop
-
out Layers
BCNET TRAINING RESULTS.
Phase
Data Type
Loss Value
Parameters
Comment
Phase 1
Coarse Tuning Source-1 Data Training: 0.0235
Validation: 0.0205
5 Epochs
Learning Rate = 0.001
Keep Prob. = 0.5
Figure 7
Coarse Tuning with Udacity Data. Not enough for full
learning.
Phase 2
Fine Tuning Source-2 Data Training: 0.0455
Validation: 0.0411
5 Epochs
Learning Rate = 0.001
Keep Prob. = 0.5
Figure 8
Fine Tuning with self
-
collected data. Proved enough for
full learning with acceptable Performance.
Phase 3
Fine Tuning Source-2 Data Training: 0.0417
Validation: 0.0377
3 Epochs
Learning Rate = 0.0005
Keep Prob. = 0.6
Figure 9
More fine tuning with self
-
collected data. Caused over
-
fitting with a kind of inferior performance.
Phase 4
Fine Tuning Source-2 Data Training: 0.0348
Validation: 0.0295
2 Epochs
Learning Rate = 0.0002
Keep Prob. = 0.7
Figure 10
More fine tuning with se
lf
-
collected data. Full learning
with very good performance.
Fig. 7. Learning progress for data source 1 (Udacity Data).
Fig. 8. Learning progress for data source 2 (Self-Collected).
2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)
Fig. 9. Learning progress for data source 2 (Self-Collected).
Fig. 10. Learning progress for data source 2 (Self-Collected).
The training of the BCNet has been carried-out through
several trials to achieve the presented results in Table V. The
following observations have been collected during the training
process:
1) Training the network using only the “source-1” Udacity
supplied data have been tried several times incorporating
several ways of data augmentation, however, acceptable
results have not been achieved, and the car always hit the
borders.
2) Using the Udacity driving simulator [13], training data has
been collected by maneuvering the car using a keyboard or
a joystick. Accordingly, useful data has been successfully
collected for training by looping the car around; as an
example; “Track 1” several times (~ 10).
3) After coarse tuning the model using the “source-1” data,
the resultant model weights are then reused for further
training and fine-tuning based on the “source-2” data using
ADAM optimizer learning rate of 0.001 as in Table V and
Fig. 8. This matter of course and then fine-tuning
resembles the transfer learning approach. Note that the two
types of data are never used together. After this fine-tuning
phase, the resultant model is then tested on “Track 1” in
the simulator and produces acceptable results (no unsafe or
sudden maneuvering).
4) In order to improve the performance further, the learning
rate of the ADAM optimizer has been halved to 0.0005 and
the model got trained for further 3 epochs. However, this
results in an over-fitting model as shown in Fig. 9.
Furthermore, the testing confirmed that after producing
inferior performance even with both training and
validation loss are lower than the previous case.
Consequently, this model is set to get further fine-tuning.
5) The learning rate of the ADAM optimizer has been
reduced further to 0.0002 and keep probability increased
to 0.7 and the model got trained for an extra 2 epochs (Fig.
10). The resultant model is then tested on “Track 1” and
produced a very good performance.
VII. A SHORTCOMING OF THE IMPLEMENTED APPROACH
The following list summaries the identified shortcomings:
1) The presented neural network model doesn’t have a
memory, it takes momentarily decision and doesn’t build
on previous states to make the current decision. However,
It is believed that driving is a sequential process and the
current approach doesn’t mimic that.
2) After training the network on one track and testing it on
another one (considerably different that first one), it may
produce unacceptable results in some scenarios in terms of
driving behavior, as it has never gone through these
scenarios before. Accordingly, this approach may require
the network to be exposed to a massive number of tracks
in order to generalize well for actual street deployment
(commercial application).
VIII. SUGGESTED IMPROVEMENTS
The following points summarize the suggested
improvements:
1) Other network topologies with a memory like Long Short-
Term Memory (LSTM) models need to be tested for
behavior cloning end-to-end learning.
2) The network needs to be trained on much more tracks,
maneuvering scenarios and road conditions in order to
make it generalize as much as possible.
3) More useful data can be generated from the currently
collected data by random distortion addition, brightness
manipulation, jitter, and rotation … etc.
4) Applying the concept of a finite impulse response (FIR)
filtering or the moving average concept for the steering
angle estimation before the final steering command,
instead of using the raw estimated value directly. In such a
case, the new estimated value will depend on the previous
history as well.
IX. CONCLUSION
In this paper, a CNN-based safe steering controller
“BCNet” has been proposed. The architecture of the CNN is
presented in details. The structure of the comprehensive
training, validation, and testing data is described. The
involved image processing algorithms have been described as
well and their contributions are analyzed. The BCNet has
shown that it is able to learn the entire task of lane and road
following without manual decomposition into road or lane
marking detection, semantic abstraction, path planning, and
control. A small amount of training data from one or two
tracks was sufficient to train the car to drive safely in multiple
2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)
tracks. The CNN is able to learn meaningful road features
from a very sparse training signal (steering alone). It has been
shown throughout the training process that the quality of data
(much more than quantity) is specifically crucial for this
application. Therefore, a comprehensive pipeline of training
data pre-processing has been carefully implemented.
Moreover, the shortcomings of the proposed approach
have been discussed with proposed improvement actions for
future work being elaborated. The presented solution presents
a cornerstone in facilitating the existence of fully autonomous
cars in the near future.
REFERENCES
[1] Karim Mansour, Wael Farag, “AiroDiag: A Sophisticated Tool that
Diagnoses and Updates Vehicles Software Over Air”, IEEE Intern.
Electric Vehicle Conference (IEVC), TD Convention Center
Greenville, SC, USA, March 4, 2012, ISBN: 978-1-4673-1562-3.
[2] Wael Farag, “CANTrack: Enhancing automotive CAN bus security
using intuitive encryption algorithms”, 7th Inter. Conf. on Modeling,
Simulation, and Applied Optimization (ICMSAO), UAE, March 2017.
[3] Long Chen, Qingquan Li, Ming Li, Qingzhou Mao, “Traffic sign
detection and recognition for intelligent vehicle”, IEEE Intelligent
Vehicles Symposium, June 2011, Baden-Baden, Germany.
[4] J. Greenhalgh and M. Mirmehdi, “Real-Time Detection and
Recognition of Road Traffic Signs”, IEEE trans. on intelligent
transportation systems, 13(4), Dec. 2012.
[5] Á. Arcos-García, J.A. Álvarez-García, L.M. Soria-Morillo, “Deep
neural network for traffic sign recognition systems: An analysis of
spatial transformers and stochastic optimisation methods”, Neural
Networks 99 (2018) 158–165, Elsevier.
[6] Wael Farag, Zakaria Saleh, "Traffic Signs Identification by Deep
Learning for Autonomous Driving", Smart Cities Symposium (SCS'18),
Bahrain, 22-23 April 2018.
[7] Wael Farag, “Recognition of traffic signs by convolutional neural nets
for self-driving vehicles”, International Journal of Knowledge-based
and Intelligent Engineering Systems, IOS Press, Vol: 22, No: 3, pp. 205
– 214, 2018.
[8] M Bojarski, D Del Testa, D Dworakowski, B Firner, B Flepp, P Goyal,
... et al., “End to End Learning for Self-Driving Cars”,
arXiv:1604.07316, 25 Apr 2016.
[9] Keras Documentation, “https://keras.io/”.
[10] TensorFlow, “https://www.tensorflow.org/”.
[11] Python, “https://www.python.org/
[12] Udacity Sample Training Data,
https://d17h27t6h515a5.cloudfront.net/topher/2016/December/584f6e
dd_data/data.zip
[13] Udacity Simulator, https://github.com/udacity/self-driving-car-sim
[14] Shervine Amidi, https://stanford.edu/~shervine/blog/keras-how-to-
generate-data-on-the-fly.html”.
[15] D.P. Kingma, J. Ba, “Adam: A Method for Stochastic Optimization”,
3rd Inter. Conf. for Learning Representations, San Diego, USA, 2015.
[16] Léon Bottou, "Online Algorithms and Stochastic Approximations",
Online Learning and Neural Nets, Cambridge Univ. Press, ISBN 978-
0-521-65263-6, (1998).
[17] Sebastian Ruder, “An overview of gradient descent optimization
algorithms”, arXiv:1609.04747v2, 15 Jun 2017.
[18] Wael Farag, “Synthesis of intelligent hybrid systems for modeling and
control”, University of Waterloo, Canada, 1998.
2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)
... One of the approaches to autonomous vehicles is Behavioral Cloning [22], which was presented by Wael et al., in 2018. The core idea is that a convolutional neural network takes data provided by a human expert driver and learns from it. ...
Article
Full-text available
Developing a safe and reliable autonomous vehicle has been a significant focus in recent years. Supervised learning methods require large amounts of labelled data for training, making it expensive. The performance of these agents is limited to the data provided in training and the inability to generalize performance in different environments. In addition, some driving situations, such as near-accident scenarios, are difficult to cover in the training data. As a result, the autonomous driving agent may behave unexpectedly in safety-critical situations, making it unreliable for safe transportation. Reinforcement learning is a potential solution for these issues. This research paper explores the potential of applying deep reinforcement learning techniques to autonomous driving, with a spotlight on comparing two popular deep reinforcement learning algorithms: Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG). The study uses the CARLA simulator, which provides a realistic environment and conditions for testing autonomous driving algorithms. The study finds that DDPG outperforms DQN regarding average reward, but DQN performs better regarding collision rate.
... One of the approaches to autonomous vehicles is Behavioral Cloning [22], which was presented by Wael et al. in 2018. The core idea is that a convolutional neural network takes data provided by a human expert driver and learns from it. ...
Conference Paper
Full-text available
Developing a safe and reliable autonomous vehicle has been a significant focus of machine learning research in recent years. Supervised learning methods require large amounts of labeled data for training, making it expensive. The performance of these agents is limited to the data provided in training and the inability to generalize performance in different environments. In addition, some driving situations, such as near-accident scenarios, are difficult to cover in the training data. As a result, the autonomous driving agent may behave unexpectedly in safety-critical situations, making it unreliable for safe transportation. Reinforcement learning is a potential solution for these issues. This research paper explores the potential of applying deep reinforcement learning techniques to autonomous driving, with a spotlight on comparing two popular deep reinforcement learning algorithms, which are Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG). The study uses the CARLA simulator, which provides a realistic environment and conditions for testing autonomous driving algorithms. The study finds that DDPG outperforms DQN in terms of average reward, but DQN performs better regarding collision rate. The paper concludes with a discussion of the effects of these results on autonomous driving and future research directions.
... The measurement, an alternative to mean average precision (mAP) [37], was reported to be 0.556, indicating poor accuracy. In the work conducted by Farag et al. [38], a behavior cloning approach was employed for autonomous driving. The methodology involved simulating driving scenarios using recorded vehicle behaviors and training a convolutional neural network (CNN) based on this data. ...
Article
Full-text available
Autonomous vehicles are at the forefront of future transportation solutions, but their success hinges on reliable perception. This review paper surveys image processing and sensor fusion techniques vital for ensuring vehicle safety and efficiency. The paper focuses on object detection, recognition, tracking, and scene comprehension via computer vision and machine learning methodologies. In addition, the paper explores challenges within the field, such as robustness in adverse weather conditions, the demand for real-time processing, and the integration of complex sensor data. Furthermore, we examine localization techniques specific to autonomous vehicles. The results show that while substantial progress has been made in each subfield, there are persistent limitations. These include a shortage of comprehensive large-scale testing, the absence of diverse and robust datasets, and occasional inaccuracies in certain studies. These issues impede the seamless deployment of this technology in real-world scenarios. This comprehensive literature review contributes to a deeper understanding of the current state and future directions of image processing and sensor fusion in autonomous vehicles, aiding researchers and practitioners in advancing the development of reliable autonomous driving systems.
... BC is the simplest of such category of methods and uses supervised learning to clone the actions in the dataset. BC is a powerful method and has been used successfully in many applications such as learning a quadroter to fly (Giusti et al., 2015), self-driving cars (Bojarski et al., 2016;Farag & Saleh, 2018) and games (Pearce & Zhu, 2022). These application are highly complex and shows accurate policy estimation from high-quality offline data. ...
Article
Full-text available
Behavioural cloning (BC) is a commonly used imitation learning method to infer a sequential decision-making policy from expert demonstrations. However, when the quality of the data is not optimal, the resulting behavioural policy also performs sub-optimally once deployed. Recently, there has been a surge in offline reinforcement learning methods that hold the promise to extract high-quality policies from sub-optimal historical data. A common approach is to perform regularisation during training, encouraging updates during policy evaluation and/or policy improvement to stay close to the underlying data. In this work, we investigate whether an offline approach to improving the quality of the existing data can lead to improved behavioural policies without any changes in the BC algorithm. The proposed data improvement approach - Model-Based Trajectory Stitching (MBTS) - generates new trajectories (sequences of states and actions) by ‘stitching’ pairs of states that were disconnected in the original data and generating their connecting new action. By construction, these new transitions are guaranteed to be highly plausible according to probabilistic models of the environment, and to improve a state-value function. We demonstrate that the iterative process of replacing old trajectories with new ones incrementally improves the underlying behavioural policy. Extensive experimental results show that significant performance gains can be achieved using MBTS over BC policies extracted from the original data. Furthermore, using the D4RL benchmarking suite, we demonstrate that state-of-the-art results are obtained by combining MBTS with two existing offline learning methodologies reliant on BC, model-based offline planning (MBOP) and policy constraint (TD3+BC).
... BC formulates a supervised learning problem where human actions serve as the ground truth labels of the states. Then, the policy to be cloned can be trained using deep neural networks (DNN) [53] or Gaussian mixture models (GMM) [54]. When human demonstrations are recorded as movement trajectories, states refer to the position and velocity of human trajectories at a certain time and action is the corresponding acceleration [28]. ...
Preprint
Full-text available
Reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots still suffers from low efficiency in terms of slow training speed and poor generalizability. In this paper, we propose a novel RL-based robot motion planning framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent. IBC utilizes human demonstration data to leverage the training speed of RL, and DMP serves as a heuristic model that transfers motion planning into a simpler planning space. To support this, we also create a human demonstration dataset using a pick-and-place experiment that can be used for similar studies. Comparison studies in simulation reveal the advantage of the proposed method over the conventional RL agents with faster training speed and higher scores. A real-robot experiment indicates the applicability of the proposed method to a simple assembly task. Our work provides a novel perspective on using motion primitives and human demonstration to leverage the performance of RL for robot applications.
Article
In the last few decades we have witnessed a significant development in Artificial Intelligence (AI) thanks to the availability of a variety of testbeds, mostly based on simulated environments and video games. Among those, roguelike games offer a very good trade-off in terms of complexity of the environment and computational costs, which makes them perfectly suited to test AI agents generalization capabilities. In this work, we present LuckyMera, a flexible, modular, extensible and configurable AI framework built around NetHack, a popular terminal-based, single-player roguelike video game. This library is aimed at simplifying and speeding up the development of AI agents capable of successfully playing the game and offering a high-level interface for designing game strategies. LuckyMera comes with a set of off-the-shelf symbolic and neural modules (called "skills"): these modules can be either hard-coded behaviors, or neural Reinforcement Learning approaches, with the possibility of creating compositional hybrid solutions. Additionally, LuckyMera comes with a set of utility features to save its experiences in the form of trajectories for further analysis and to use them as datasets to train neural modules, with a direct interface to the NetHack Learning Environment and MiniHack. Through an empirical evaluation we validate our skills implementation and propose a strong baseline agent that can reach state-of-the-art performances in the complete NetHack game. LuckyMera is open-source and available at https://github.com/Pervasive-AI-Lab/LuckyMera.
Thesis
Full-text available
Abstract The intelligent information processing performed in humans is now king mimicked in a new generation of adaptive machines as state-of-the-art technology. Inspûed by the functionality of brain nerve cells, artificial neural networks can learn to recognize complex patterns and iùnctions, and based on the biological p ~ c i p l of "survival of the fittest", genetic algorithms are developed as powerful optimization and search techniques. Likewise, fuzzy logic imitates the mechanism of approximate reasoning performed in the human mind, and hence cm reason with Linguistic and imprecise information. Although these intelligent techniques have produced promising results in some applications, certain complex problems cannot be solved using only a single technique. Each technique has particular computation features (e.g. ability to learn, explanation of decisions) that make it suitable for particular problems and not for others. These limitations have motivated the creation of intelligent Izybrid systems where two or more techniques are combined Although there is an increasing interest in the integration of fuzzy logic, neural networks, and genetic algorithms to build intelligent hybrid systems. no systematic synthesis framework has been developed so far. Therefore, the objective of this thesis is to construct an intelligent learning science that incorporates the merits and overcomes the limitations of the three paradigms. The applications considered for the proposed scheme are modeling and control. The generic topology of the system used in this thesis has a transparent structure; its parameters. links, signals, and modules have their own physical interpretations. Moreover, the learning scheme uses task decomposition to identify the systems' parameters. The leaming cask is decomposed into three subtasks (phases). The first phase performs a coarse identification of the systems' numerical parameters using an unsupervised learning (clustering) algorithm. The second phase finds the linguistic association parameters (linguistic rules) using unsupervised as well as supervised learning algorithms. In the thud phase, the numerical parameters are optimized and fine-tuned using supervised learning and search techniques. The performance of the theme is assessed by testing it on two benchmark modeling applications. The results are compared to those of other intelligent modeling approaches to show the performance characteristics of the proposed scheme. The scheme is also assessed by applying it to nonlinear control problems. The synchronous machine voltage regulation and speed stabilization problems have been tackled using the proposed scheme. Several comparative studies have been carried out to show the advantages of the proposed control approach over conventional approaches.
Article
Full-text available
In this paper, a comprehensive Convolutional Neural Network (CNN) based classifier "WAF-LeNet" is proposed and developed to be used in traffic signs recognition and identification as an empowerment of autonomous driving technologies. The implemented architecture is a deep fifteen-layer network that has been selected after extensive trials to be fast enough to suit the designated application. The CNN got trained using Adam's optimization algorithm as a variant of the Stochastic Gradient Descent (SGD) technique. The learning process is carried out using the well-known "German Traffic Sign Dataset - GTSRB". The data has been partitioned into training, validation and testing data sets. Additionally, more random traffic signs images are collected from the web and further used to test the robustness of the proposed CNN classifier. The paper goes through the development process in details and shows the image processing pipeline harnessed in the development. The proposed approach proved successful in identifying correctly 96.5% of the testing data set and 100% of the robustness data set with much smaller and faster network than other counterparts.
Article
Full-text available
This paper presents a Deep Learning approach for traffic sign recognition systems. Several classification experiments are conducted over publicly available traffic sign datasets from Germany and Belgium using a Deep Neural Network which comprises Convolutional layers and Spatial Transformer Networks. Such trials are built to measure the impact of diverse factors with the end goal of designing a Convolutional Neural Network that can improve the state-of-the-art of traffic sign classification task. First, different adaptive and non-adaptive stochastic gradient descent optimisation algorithms such as SGD, SGD-Nesterov, RMSprop and Adam are evaluated. Subsequently, multiple combinations of Spatial Transformer Networks placed at distinct positions within the main neural network are analysed. The recognition rate of the proposed Convolutional Neural Network reports an accuracy of 99.71% in the German Traffic Sign Recognition Benchmark, outperforming previous state-of-the-art methods and also being more efficient in terms of memory requirements.
Article
Full-text available
We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads. The system automatically learns internal representations of the necessary processing steps such as detecting useful road features with only the human steering angle as the training signal. We never explicitly trained it to detect, for example, the outline of roads. Compared to explicit decomposition of the problem, such as lane marking detection, path planning, and control, our end-to-end system optimizes all processing steps simultaneously. We argue that this will eventually lead to better performance and smaller systems. Better performance will result because the internal components self-optimize to maximize overall system performance, instead of optimizing human-selected intermediate criteria, e.g., lane detection. Such criteria understandably are selected for ease of human interpretation which doesn't automatically guarantee maximum system performance. Smaller networks are possible because the system learns to solve the problem with the minimal number of processing steps. We used an NVIDIA DevBox and Torch 7 for training and an NVIDIA DRIVE(TM) PX self-driving car computer also running Torch 7 for determining where to drive. The system operates at 30 frames per second (FPS).
Article
Full-text available
This paper introduces a novel method for diagnosing embedded systems and updating embedded software installed on the electronics control units of vehicles through the Internet using client and server units. It also presents the communication protocols between the vehicle and the manufacturer for instant fault diagnosis and software update while ensuring security for both parties. AiroDiag ensures maximum vehicle efficiency for the driver and provides the manufacturer with up-to-date vehicle performance data, allowing enhanced future software deployment and minimum loss in case of vehicle recalls.
Article
Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.