Conference PaperPDF Available

Behavior Cloning for Autonomous Driving using Convolutional Neural Networks

November 2018

November 2018

DOI:10.1109/3ICT.2018.8855753

Conference: 2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)

Authors:

Wael A. Farag

The American University of the Middle East

A sample of the collected images: center, left and right respectively.

…

The histogram of the steering angles collected during driving.

…

The histogram of the steering angles after flattening.

…

Learning progress for data source 2 (Self-Collected).

…

Figures - uploaded by Wael A. Farag

Content may be subject to copyright.

Content uploaded by Wael A. Farag

Content may be subject to copyright.

Behavior Cloning for Autonomous Driving using

Convolutional Neural Networks

Wael Farag1,2,4, Zakaria Saleh3,5

1American University of the Middle East (AUM), Kuwait.

2Electrical Eng., Cairo University, Egypt

3University of Bahrain, Bahrain.

4wael.farag@aum.edu.kw, 5zsaleh@ubo.edu.bh

Abstract—In this paper, we propose using a Convolutional

Neural Network (CNN) to learn safe driving behavior and

smooth steering maneuvering as an empowerment of

autonomous driving technologies. The training data is collected

from a front-facing camera and the steering commands issued

by an experienced driver driving in traffic as well as urban

roads. This data is then used to train the proposed CNN to

facilitate what we call it behavioral cloning. The proposed

Behavior Cloning CNN is named as “BCNet” and its deep

seventeen-layer architecture has been selected after extensive

trials. The BCNet got trained using Adam’s optimization

algorithm as a variant of the Scholastic Gradient Descent (SGD)

technique. The paper goes through the development and

training process in details and shows the image processing

pipeline harnessed in the development. The proposed approach

proved successful in cloning the driving behavior embedded in

the training data set after extensive simulations.

Keywords—Behavioral Cloning, Convolutional Neural

Network, Autonomous Driving, Machine Learning

I. INTRODUCTION

In the past decade, the automobile industry has made a

shift towards intelligent vehicles equipped with driving

assistance systems [1-2] and recently has introduced vision

systems in their high-end cars. The vision system (the

mounted cameras in the car including the front-facing ones) is

being utilized by autonomous driving engineers to develop

many of the future self-driving cars features like: a) road-lane

finding; b) free driving-space finding; c) traffic signs detection

and recognition [3-7]; d) traffic lights detection and

recognition; e) road-objects detection and tracking. In this

paper, we propose to use the mounted car vision system (more

specifically, the front-facing camera) to improve the safety

and the driving behavior of future self-driving cars.

The main idea is to construct a Convolutional Neural

Network (CNN) that is able to learn the safe driving

maneuvers from data collected through the driving of an

expert driver in urban roads. The main focus in this paper to

let the proposed CNN map raw pixels from a single front-

facing camera directly to steering commands of the car. This

an end-to-end approach that lets the car drives without lane

markings on highways and inroads with unclear visual

guidance such as in parking lots and on unpaved roads [8]. The

CNN automatically learns internal representations of the

necessary processing pipeline steps such as detecting useful

road features with only the human steering angle as the

training signal. In comparison with the explicit decomposition

of the autonomous driving problem into lane-marking

detection, path planning, and control, the proposed end-to-end

CNN optimizes all processing steps simultaneously.

II. THE CNN ARCHITECTURE

The proposed CNN architecture is a seventeen-layer

Behavior Cloning CNN model is given the name “BCNet”.

The model is coded using Keras [8] on top of Tensorflow [10]

in Python [11]. Fig. 1 illustrates the BCNet architecture as

well as Table I below describes the architecture in details:

BCNET ARCHITECTURE.

# Layer Size/Output Parameters Comment

1 Input 160x320x3 ------------ Color – 3

Channels RGB

2 Normalization 160x320x3 lambda x:

x/127.5 - 1

Scaling the

inputs => -1 &

1, using Keras

Lambda

function

3 Cropping 65x320x3 =

62,400

The new

height will be

(160-70) - 25

= 65

Cropping the

images, cut 70

pixels from the

top and 25

pixels from the

bottom.

4 Convolutional

#1 31x159x24

No. Filters

=24,

Kernel =5x5,

Strides = 2x2,

Padding:

Valid,

Activation =

RELU

No Pooling

[(W−F+2P)/S]+1

5 Convolutional

#2 14x78x36

No. Filters

=36,

Kernel =5x5,

Strides = 2x2,

Padding:

Valid,

Activation =

RELU

No Pooling

[(W−F+2P)/S]+1

6 Convolutional

5x38x48

No. Filters

=48,

Kernel =5x5,

Strides = 2x2,

Padding:

Valid,

Activation =

RELU

No Pooling

[(W−F+2P)/S]+1

7 Convolutional

3x36x64

No. Filters

=64,

Kernel =3x3,

Strides = 1x1,

Padding:

Valid,

No Pooling

[(W−F+2P)/S]+1

2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)

Activation =

RELU

8 Convolutional

#5 1x34x64

No. Filters

=64,

Kernel =3x3,

Strides = 1x1,

Padding:

Valid,

Activation =

RELU

No Pooling

[(W−F+2P)/S]+1

9 Flatten 1x34x64 =

2,176

Keras Flatten

Function

10 Drop-out 2,176

Keep

Probability:

0.5 => 0.7

11 Fully

Connected #1 200 Keras Dense

Layer With biases

12 Drop-out 200

Keep

Probability:

0.5 => 0.7

13 Fully

Connected #2 100 Keras Dense

Layer With biases

14 Drop-out 100

Keep

Probability:

0.5 => 0.7

15 Fully

Connected #3 20 Keras Dense

Layer With biases

16 Drop-out 20

Keep

Probability:

0.5 => 0.7

17 Fully

Connected #4 1 Keras Dense

Layer Output layer

Four Drop-out layers are added to prevent over-fitting

during training, and the fully connected layers are widened.

Also, No Pooling layers are used here, as it is a regression

problem, not a classification. Additionally, all the

convolutional layers are sized according to the input image

sizes after normalization and cropping.

III. THE TRAINING DATA SET

The following are the two main sources of data which are

utilized to construct the training data set that is used to train

the BCNet:

1) Source 1 - Udacity Supplied Data [12]: These collections

with an unzipped size of 365MB consists of 24,108 images

equally divided between center, left and right front

cameras shots. Each image is 160x320 pixels size with 3

channels for RGB colors. The index of the data is stored in

a CSV file which contains 8,036 line of records.

2) Source 2 - Simulator Generated Data: collected using the

open source Udacity driving simulator in [13]. The

recorded data set has an unzipped size of 808MB and

consists of 49,851 images equally divided between center,

left and right front cameras shots. Each image is 160x320

pixels size with 3 channels for RGB colors. The index of

the data is stored in a CSV file which contains 16,617 line

of records. The data has been generated by driving the car

manually around Track 1 in the mentioned simulator

several times (~ 10 times) with as good as possible safe

driving behavior. Particularly, it is encouraged to include

"recovery" data while training. This means that data should

be captured starting from the point of approaching the edge

of the track (perhaps nearly missing a turn and almost

driving off the track) and recording the process of steering

the car back toward the center of the track to give the

model a chance to learn recovery behavior.

Several subroutines have been written for data

visualization and analysis. This acts as a sort of sanity check

to verify that the preprocessing is not fundamentally flawed.

Flawed data will almost certainly act to confuse the model and

result in unacceptable performance. An Example of the output

of these subroutines is presented in Fig. 2 which displays a

sample of the generated training data (source 2), and Fig. 3

which presents the histogram of the steering angle values

collected during driving (source 2).

The data (both source 1 and 2) is divided into 2 separate

parts: training data which represents 80% of the chunk and

validation data which represents 20% of the chunk.

160x320x3

Convolution

31x159x24

Kernel: 5x5

Strides: 2x2

Cropping

65x320x3

Convolution #2

14x78x36

Kernel: 5x5

Strides: 2x2

Convolution #3

5x38x48

Kernel: 5x5

Strides: 2x2

Convolution #4

3x36x64

Kernel: 3x3

Strides: 1x1

Convolution #5

1x34x64

Kernel: 3x3

Strides: 1x1

Flatten

2176

Drop-out

0.5

Drop-out

0.7

Fully

Connected 200

Drop-out

0.5

Fully

Connected 100

Drop-out

0.7

Fully

Connected 20

Output

Neuron

Fig. 1. The BCNet Architecture.

IV. DRIVING DATA PRE-PROCESSING

Before using the front camera's images in

training/validation data sets, these images need to be pre-

processed to make more useful and convenient throughout the

learning process. The pre-processing steps meant to improve

the training results and reduce the computation as much as

possible. The following steps describe the implemented pre-

processing steps in order of execution:

1) Normalization (color): this is done for color images using

the “Lambda function” in Keras [9] by simply

2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)

implementing a min-max scaling. The values of the RGB

pixels are scaled to the -1 → 1 range and centered on zero

instead of instead of the 0→255 range.

2) Cropping images: The images have been cropped from the

top by 70 pixels and from the bottom by 25 pixels, in order

to focus on the region of interest (ROI) and to reduce the

number of inputs (faster learning process). The cropped

images have the size of 65x320x3.

3) Flipping images: The data has been doubled (augmented)

by flipping all the images (around the y-axis) and reversing

the sign of the corresponding steering angle. Accordingly,

the source-1 data becomes 48,216 samples, and source-2

data becomes 99,702 samples. In other words, each CSV

line record can generate 6 training samples (center, left,

right, flipped-center, flipped-left, and flipped-right).

4) Jittering images: To minimize the model's tendency to

overfit to the conditions of the test track, images are

"jittered" before being fed to the BCNet. The jittering

consists of a randomized brightness adjustment, a

randomized shadow, and a randomized horizon shift. The

shadow effect is simply a darkening of a random

rectangular portion of the image, starting at either the left

or right edge and spanning the height of the image. The

horizon shift applies a perspective transform beginning at

the horizon line (at roughly 2/5 of the height) and shifting

it up or down randomly by up to 1/8 of the image height.

The horizon shift is meant to mimic the topology

conditions of the test track.

5) Data Distribution Flattening: Because the test track

includes long sections with very slight or no curvature, the

data captured from it tends to be heavily skewed toward

low and zero turning angles. This creates a problem for the

neural network, which then becomes biased toward driving

in a straight line and can become easily confused by sharp

turns. The distribution of the input data can be observed in

Fig. 3. To reduce the occurrence of low and zero angle data

points, a histogram of the turning angles is produced and

the average number of samples per bin is computed. Next,

a "keep probability" for the samples belonging to each bin

is determined. That keep probability is 1.0 for bins that

contain less than the computed average samples per bin,

and for other bins, the keep probability is calculated to be

the number of samples for that bin divided by average

samples per bin. Finally, random data points from the data

set are removed with a rate of (1 – “keep probability”). The

resulting data distribution can be seen in Fig. 4. The

distribution is not uniform overall, but it is much closer to

uniform for lower and zero turning angles. This method

helped speed up the training process as lower size data is

used but with higher quality.

6) Cleaning the dataset: it is discovered that the model

performed especially poorly on certain data points, and

then found those data points to be mislabeled in several

cases. A subroutine is created to display frames from the

dataset on which the model performs the worst. The intent

was to manually adjust the steering angles for the

mislabeled frames. Even though this approach is tedious,

it helped to improve the results of the training to some

extent.

7) Shuffling the training data: this is done once each training

epoch to avoid pattern memorization and consequently

trapping in local minima.

8) Using generator function to load data in memory [14]: this

step helps a lot to smooth out the training process, as it is

actually mandatory. Loading the whole data in the

computer memory was actually not possible (or at least not

practical). Each patch (size = 128 images and angles) is

generated and loaded in memory individually. The

fit_generator() function by Keras [14] is used to manage

the whole process.

Fig. 2. A sample of the collected images: center, left and right

respectively.

Finally, the actually used pre-processing pipeline is:

Color-Image → Normalization → Cropping→ Flipping→

Jittering→Shuffling→ Batch Memory Loading.

It is found out that the above pipeline is fair enough and

produced the required results. However, other techniques

were put into consideration; it may be needed or for future

endeavors; as follows:

1) Converting color images to grey: reduces complexity a lot

by reducing the size of training data and the associated

computation to 1/3rd.

2) Incorporating Edge detection and lane finding using

Cunny algorithm and Hough transform.

3) Blur-Filtering the input images using Gaussian filtering to

remove noise.

Fig. 3. The histogram of the steering angles collected during driving.

2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)

Fig. 4. The histogram of the steering angles after flattening.

V. THE LEARNING ALGORITHM

In this work, Adam learning algorithm [15] is used to train

the proposed BCNet and update its weights iteratively based

on the prepared training driving data. Adam is an

optimization algorithm that is used instead of the classical

stochastic gradient descent (SGD) learning algorithm [16].

The method computes individual adaptive learning rates for

different parameters from estimates of first and second

moments of the gradients.

Adaptive Moment Estimation (ADAM) [15] is a method

that computes adaptive learning rates for each parameter. In

addition to storing an exponentially decaying average of past

squared gradients , Adam also keeps an exponentially

decaying average of past gradients , similar to momentum,

and the update equations are given as follows:

















(

−





)





(1)

















(

−





)







 and  are estimates of the first moment (the mean) and

the second moment (the uncentered variance) of the gradients

respectively, hence the name of the method. As  and  are

initialized as vectors of 0’s, the authors of Adam observe that

they are biased towards zero, especially during the initial time

steps, and especially when the decay rates are small (i.e. β1

and β2 are close to 1).

These biases have been counteracted by computing bias-

corrected first and second-moment estimates as follows:











−







(2)











−







The parameters are being updated using the above

estimates which yields the following Adam update rule:













−



















(3)

for all neural network model's parameters θ ∈ ℝd (weights

and biases) where  is the learning rate, and  is a very small

constant prevents divide by zero.

The authors of Adam’s method propose default values of

0.9 for β1, 0.999 for β2, and 10-8 for . They show empirically

that Adam works well in practice and compares favorably to

other adaptive learning-method algorithms [17].

The output layer of the proposed BCNet is a linear

regression function [18]. The network models the system as

a linear combination of features to produce the final estimated

output . The function is given by

 = ℎ(, )=∑



 +  (4)

where xj is the jth input to the output unit and wj is the jth

weight of the jth input, b is the bias term, N is the number of

inputs to the output unit (or in other words, the size of the

previous hidden layer), w is the vector of weights w = [w0, w1,

… wj, … wN-1], x is the vector of inputs x = [x0, x1, … xj, …

xN-1].

The main task of the training process is to find the weights

that provide the best fit for the training data. One way to

measure this fit is to calculate the least squares error (or the

data loss) over the training dataset:

()= ∑(ℎ(, )−



 )= ∑(− )



 (5)

Where L is the data loss function that needs to be

minimized using Adam’s algorithm, yi is the ith ground truth

sample output,  is the it h output estimate of the neural net,

and M is the number of training samples. Then the gradient

descent will be used on the loss's gradient

∇

w L(w) in order to

minimize the overall error on the training data. Using this, the

weights can be updated using the standard gradient descent:

 =  − ∇() (6)

where η is the learning rate.

VI. THE BCNET TRAINING RESULTS

The BCNet model is trained using the parameters listed in

Tables II, III and IV using ADAM’s optimization algorithm.

Fig. 5 shows the setup of the BCNet used during the training

phase, while Fig. 6 shows the setup during the running and

simulation modes. Furthermore, the training results are

presented in Table V, Fig. 7 and Fig. 8. The state of the model

is a bit over-fitting after the training represented by Fig. 8. For

this reason, the learning rate is further reduced and the keep

probability increased.

2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)

Fig. 5. Overview of the CNN in training mode.

Fig. 6. Overview of the trained CNN in running mode.

BCNET TRAINING PARAMETERS (LEARNING RATE).

Algorithm

Parameter

Value

Comment

ADAM

Optimization

Learning

Rate

0.001

For epochs: 0

–

5 for data

source 1

For epochs: 0 – 5 for data

source 2

0.0005

For epochs:

–

8 for data

source 2

0.0002

For epochs: 8

–

9 for data

source 2

BCNET TRAINING PARAMETERS (LEARNING RATE).

Parameter

Value

Comment

Left Angle Correction

0.25

Radians

Right Angle Correction

0.25

Radians

BCNET TRAINING PARAMETERS (OTHERS).

Parameter

Value

Comment

Batch Size

120

For epochs: 0

Epoch

The whole training

Keep Probability

0.5 => 0.7

For Drop

out Layers

BCNET TRAINING RESULTS.

Phase

Data Type

Loss Value

Parameters

Comment

Phase 1

Coarse Tuning Source-1 Data Training: 0.0235

Validation: 0.0205

5 Epochs

Learning Rate = 0.001

Keep Prob. = 0.5

Figure 7

Coarse Tuning with Udacity Data. Not enough for full

learning.

Phase 2

Fine Tuning Source-2 Data Training: 0.0455

Validation: 0.0411

5 Epochs

Learning Rate = 0.001

Keep Prob. = 0.5

Figure 8

Fine Tuning with self

collected data. Proved enough for

full learning with acceptable Performance.

Phase 3

Fine Tuning Source-2 Data Training: 0.0417

Validation: 0.0377

3 Epochs

Learning Rate = 0.0005

Keep Prob. = 0.6

Figure 9

More fine tuning with self

collected data. Caused over

fitting with a kind of inferior performance.

Phase 4

Fine Tuning Source-2 Data Training: 0.0348

Validation: 0.0295

2 Epochs

Learning Rate = 0.0002

Keep Prob. = 0.7

Figure 10

More fine tuning with se

collected data. Full learning

with very good performance.

Fig. 7. Learning progress for data source 1 (Udacity Data).

Fig. 8. Learning progress for data source 2 (Self-Collected).

2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)

Fig. 9. Learning progress for data source 2 (Self-Collected).

Fig. 10. Learning progress for data source 2 (Self-Collected).

The training of the BCNet has been carried-out through

several trials to achieve the presented results in Table V. The

following observations have been collected during the training

process:

1) Training the network using only the “source-1” Udacity

supplied data have been tried several times incorporating

several ways of data augmentation, however, acceptable

results have not been achieved, and the car always hit the

borders.

2) Using the Udacity driving simulator [13], training data has

been collected by maneuvering the car using a keyboard or

a joystick. Accordingly, useful data has been successfully

collected for training by looping the car around; as an

example; “Track 1” several times (~ 10).

3) After coarse tuning the model using the “source-1” data,

the resultant model weights are then reused for further

training and fine-tuning based on the “source-2” data using

ADAM optimizer learning rate of 0.001 as in Table V and

Fig. 8. This matter of course and then fine-tuning

resembles the transfer learning approach. Note that the two

types of data are never used together. After this fine-tuning

phase, the resultant model is then tested on “Track 1” in

the simulator and produces acceptable results (no unsafe or

sudden maneuvering).

4) In order to improve the performance further, the learning

rate of the ADAM optimizer has been halved to 0.0005 and

the model got trained for further 3 epochs. However, this

results in an over-fitting model as shown in Fig. 9.

Furthermore, the testing confirmed that after producing

inferior performance even with both training and

validation loss are lower than the previous case.

Consequently, this model is set to get further fine-tuning.

5) The learning rate of the ADAM optimizer has been

reduced further to 0.0002 and keep probability increased

to 0.7 and the model got trained for an extra 2 epochs (Fig.

10). The resultant model is then tested on “Track 1” and

produced a very good performance.

VII. A SHORTCOMING OF THE IMPLEMENTED APPROACH

The following list summaries the identified shortcomings:

1) The presented neural network model doesn’t have a

memory, it takes momentarily decision and doesn’t build

on previous states to make the current decision. However,

It is believed that driving is a sequential process and the

current approach doesn’t mimic that.

2) After training the network on one track and testing it on

another one (considerably different that first one), it may

produce unacceptable results in some scenarios in terms of

driving behavior, as it has never gone through these

scenarios before. Accordingly, this approach may require

the network to be exposed to a massive number of tracks

in order to generalize well for actual street deployment

(commercial application).

VIII. SUGGESTED IMPROVEMENTS

The following points summarize the suggested

improvements:

1) Other network topologies with a memory like Long Short-

Term Memory (LSTM) models need to be tested for

behavior cloning end-to-end learning.

2) The network needs to be trained on much more tracks,

maneuvering scenarios and road conditions in order to

make it generalize as much as possible.

3) More useful data can be generated from the currently

collected data by random distortion addition, brightness

manipulation, jitter, and rotation … etc.

4) Applying the concept of a finite impulse response (FIR)

filtering or the moving average concept for the steering

angle estimation before the final steering command,

instead of using the raw estimated value directly. In such a

case, the new estimated value will depend on the previous

history as well.

IX. CONCLUSION

In this paper, a CNN-based safe steering controller

“BCNet” has been proposed. The architecture of the CNN is

presented in details. The structure of the comprehensive

training, validation, and testing data is described. The

involved image processing algorithms have been described as

well and their contributions are analyzed. The BCNet has

shown that it is able to learn the entire task of lane and road

following without manual decomposition into road or lane

marking detection, semantic abstraction, path planning, and

control. A small amount of training data from one or two

tracks was sufficient to train the car to drive safely in multiple

2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)

tracks. The CNN is able to learn meaningful road features

from a very sparse training signal (steering alone). It has been

shown throughout the training process that the quality of data

(much more than quantity) is specifically crucial for this

application. Therefore, a comprehensive pipeline of training

data pre-processing has been carefully implemented.

Moreover, the shortcomings of the proposed approach

have been discussed with proposed improvement actions for

future work being elaborated. The presented solution presents

a cornerstone in facilitating the existence of fully autonomous

cars in the near future.

REFERENCES

[1] Karim Mansour, Wael Farag, “AiroDiag: A Sophisticated Tool that

Diagnoses and Updates Vehicles Software Over Air”, IEEE Intern.

Electric Vehicle Conference (IEVC), TD Convention Center

Greenville, SC, USA, March 4, 2012, ISBN: 978-1-4673-1562-3.

[2] Wael Farag, “CANTrack: Enhancing automotive CAN bus security

using intuitive encryption algorithms”, 7th Inter. Conf. on Modeling,

Simulation, and Applied Optimization (ICMSAO), UAE, March 2017.

[3] Long Chen, Qingquan Li, Ming Li, Qingzhou Mao, “Traffic sign

detection and recognition for intelligent vehicle”, IEEE Intelligent

Vehicles Symposium, June 2011, Baden-Baden, Germany.

[4] J. Greenhalgh and M. Mirmehdi, “Real-Time Detection and

Recognition of Road Traffic Signs”, IEEE trans. on intelligent

transportation systems, 13(4), Dec. 2012.

[5] Á. Arcos-García, J.A. Álvarez-García, L.M. Soria-Morillo, “Deep

neural network for traffic sign recognition systems: An analysis of

spatial transformers and stochastic optimisation methods”, Neural

Networks 99 (2018) 158–165, Elsevier.

[6] Wael Farag, Zakaria Saleh, "Traffic Signs Identification by Deep

Learning for Autonomous Driving", Smart Cities Symposium (SCS'18),

Bahrain, 22-23 April 2018.

[7] Wael Farag, “Recognition of traffic signs by convolutional neural nets

for self-driving vehicles”, International Journal of Knowledge-based

and Intelligent Engineering Systems, IOS Press, Vol: 22, No: 3, pp. 205

– 214, 2018.

[8] M Bojarski, D Del Testa, D Dworakowski, B Firner, B Flepp, P Goyal,

... et al., “End to End Learning for Self-Driving Cars”,

arXiv:1604.07316, 25 Apr 2016.

[9] Keras Documentation, “https://keras.io/”.

[10] TensorFlow, “https://www.tensorflow.org/”.

[11] Python, “https://www.python.org/”

[12] Udacity Sample Training Data,

https://d17h27t6h515a5.cloudfront.net/topher/2016/December/584f6e

dd_data/data.zip

[13] Udacity Simulator, https://github.com/udacity/self-driving-car-sim

[14] Shervine Amidi, “https://stanford.edu/~shervine/blog/keras-how-to-

generate-data-on-the-fly.html”.

[15] D.P. Kingma, J. Ba, “Adam: A Method for Stochastic Optimization”,

3rd Inter. Conf. for Learning Representations, San Diego, USA, 2015.

[16] Léon Bottou, "Online Algorithms and Stochastic Approximations",

Online Learning and Neural Nets, Cambridge Univ. Press, ISBN 978-

0-521-65263-6, (1998).

[17] Sebastian Ruder, “An overview of gradient descent optimization

algorithms”, arXiv:1609.04747v2, 15 Jun 2017.

[18] Wael Farag, “Synthesis of intelligent hybrid systems for modeling and

control”, University of Waterloo, Canada, 1998.

2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)

Implementing Deep Reinforcement Learning in Autonomous Control Systems

Article

Full-text available

Mar 2024

Developing a safe and reliable autonomous vehicle has been a significant focus in recent years. Supervised learning methods require large amounts of labelled data for training, making it expensive. The performance of these agents is limited to the data provided in training and the inability to generalize performance in different environments. In addition, some driving situations, such as near-accident scenarios, are difficult to cover in the training data. As a result, the autonomous driving agent may behave unexpectedly in safety-critical situations, making it unreliable for safe transportation. Reinforcement learning is a potential solution for these issues. This research paper explores the potential of applying deep reinforcement learning techniques to autonomous driving, with a spotlight on comparing two popular deep reinforcement learning algorithms: Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG). The study uses the CARLA simulator, which provides a realistic environment and conditions for testing autonomous driving algorithms. The study finds that DDPG outperforms DQN regarding average reward, but DQN performs better regarding collision rate.

Deep Reinforcement Learning Applied on Autonomous Driving

Conference Paper

Full-text available

Oct 2023

Developing a safe and reliable autonomous vehicle has been a significant focus of machine learning research in recent years. Supervised learning methods require large amounts of labeled data for training, making it expensive. The performance of these agents is limited to the data provided in training and the inability to generalize performance in different environments. In addition, some driving situations, such as near-accident scenarios, are difficult to cover in the training data. As a result, the autonomous driving agent may behave unexpectedly in safety-critical situations, making it unreliable for safe transportation. Reinforcement learning is a potential solution for these issues. This research paper explores the potential of applying deep reinforcement learning techniques to autonomous driving, with a spotlight on comparing two popular deep reinforcement learning algorithms, which are Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG). The study uses the CARLA simulator, which provides a realistic environment and conditions for testing autonomous driving algorithms. The study finds that DDPG outperforms DQN in terms of average reward, but DQN performs better regarding collision rate. The paper concludes with a discussion of the effects of these results on autonomous driving and future research directions.

Exploring the challenges and opportunities of image processing and sensor fusion in autonomous vehicles: A comprehensive review

Article

Full-text available

Jan 2023

Autonomous vehicles are at the forefront of future transportation solutions, but their success hinges on reliable perception. This review paper surveys image processing and sensor fusion techniques vital for ensuring vehicle safety and efficiency. The paper focuses on object detection, recognition, tracking, and scene comprehension via computer vision and machine learning methodologies. In addition, the paper explores challenges within the field, such as robustness in adverse weather conditions, the demand for real-time processing, and the integration of complex sensor data. Furthermore, we examine localization techniques specific to autonomous vehicles. The results show that while substantial progress has been made in each subfield, there are persistent limitations. These include a shortage of comprehensive large-scale testing, the absence of diverse and robust datasets, and occasional inaccuracies in certain studies. These issues impede the seamless deployment of this technology in real-world scenarios. This comprehensive literature review contributes to a deeper understanding of the current state and future directions of image processing and sensor fusion in autonomous vehicles, aiding researchers and practitioners in advancing the development of reliable autonomous driving systems.

Model-based trajectory stitching for improved behavioural cloning and its applications

Article

Full-text available

Sep 2023
MACH LEARN

Behavioural cloning (BC) is a commonly used imitation learning method to infer a sequential decision-making policy from expert demonstrations. However, when the quality of the data is not optimal, the resulting behavioural policy also performs sub-optimally once deployed. Recently, there has been a surge in offline reinforcement learning methods that hold the promise to extract high-quality policies from sub-optimal historical data. A common approach is to perform regularisation during training, encouraging updates during policy evaluation and/or policy improvement to stay close to the underlying data. In this work, we investigate whether an offline approach to improving the quality of the existing data can lead to improved behavioural policies without any changes in the BC algorithm. The proposed data improvement approach - Model-Based Trajectory Stitching (MBTS) - generates new trajectories (sequences of states and actions) by ‘stitching’ pairs of states that were disconnected in the original data and generating their connecting new action. By construction, these new transitions are guaranteed to be highly plausible according to probabilistic models of the environment, and to improve a state-value function. We demonstrate that the iterative process of replacing old trajectories with new ones incrementally improves the underlying behavioural policy. Extensive experimental results show that significant performance gains can be achieved using MBTS over BC policies extracted from the original data. Furthermore, using the D4RL benchmarking suite, we demonstrate that state-of-the-art results are obtained by combining MBTS with two existing offline learning methodologies reliant on BC, model-based offline planning (MBOP) and policy constraint (TD3+BC).

Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning

Preprint

Full-text available

Jul 2023

Reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots still suffers from low efficiency in terms of slow training speed and poor generalizability. In this paper, we propose a novel RL-based robot motion planning framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent. IBC utilizes human demonstration data to leverage the training speed of RL, and DMP serves as a heuristic model that transfers motion planning into a simpler planning space. To support this, we also create a human demonstration dataset using a pick-and-place experiment that can be used for similar studies. Comparison studies in simulation reveal the advantage of the proposed method over the conventional RL agents with faster training speed and higher scores. A real-robot experiment indicates the applicability of the proposed method to a simple assembly task. Our work provides a novel perspective on using motion primitives and human demonstration to leverage the performance of RL for robot applications.

Imitation learning for aerobatic maneuvering in fixed-wing aircraft

Article

Jun 2024

LuckyMera: a modular AI framework for building hybrid NetHack agents

Article

May 2024

In the last few decades we have witnessed a significant development in Artificial Intelligence (AI) thanks to the availability of a variety of testbeds, mostly based on simulated environments and video games. Among those, roguelike games offer a very good trade-off in terms of complexity of the environment and computational costs, which makes them perfectly suited to test AI agents generalization capabilities. In this work, we present LuckyMera, a flexible, modular, extensible and configurable AI framework built around NetHack, a popular terminal-based, single-player roguelike video game. This library is aimed at simplifying and speeding up the development of AI agents capable of successfully playing the game and offering a high-level interface for designing game strategies. LuckyMera comes with a set of off-the-shelf symbolic and neural modules (called "skills"): these modules can be either hard-coded behaviors, or neural Reinforcement Learning approaches, with the possibility of creating compositional hybrid solutions. Additionally, LuckyMera comes with a set of utility features to save its experiences in the form of trajectories for further analysis and to use them as datasets to train neural modules, with a direct interface to the NetHack Learning Environment and MiniHack. Through an empirical evaluation we validate our skills implementation and propose a strong baseline agent that can reach state-of-the-art performances in the complete NetHack game. LuckyMera is open-source and available at https://github.com/Pervasive-AI-Lab/LuckyMera.

Towards Autonomous Driving System Using Behavioral Cloning Approach

Conference Paper

Oct 2023

A Framework for Few-Shot Policy Transfer Through Observation Mapping and Behavior Cloning

Conference Paper

Oct 2023

Risk-Aware Neural Navigation From BEV Input for Interactive Driving

Conference Paper

May 2023

Synthesis of Intelligent Hybrid S ystems for Modeling and Control

Thesis

Full-text available

Jan 2024

Wael A. Farag

Abstract The intelligent information processing performed in humans is now king mimicked in a new generation of adaptive machines as state-of-the-art technology. Inspûed by the functionality of brain nerve cells, artificial neural networks can learn to recognize complex patterns and iùnctions, and based on the biological p ~ c i p l of "survival of the fittest", genetic algorithms are developed as powerful optimization and search techniques. Likewise, fuzzy logic imitates the mechanism of approximate reasoning performed in the human mind, and hence cm reason with Linguistic and imprecise information. Although these intelligent techniques have produced promising results in some applications, certain complex problems cannot be solved using only a single technique. Each technique has particular computation features (e.g. ability to learn, explanation of decisions) that make it suitable for particular problems and not for others. These limitations have motivated the creation of intelligent Izybrid systems where two or more techniques are combined Although there is an increasing interest in the integration of fuzzy logic, neural networks, and genetic algorithms to build intelligent hybrid systems. no systematic synthesis framework has been developed so far. Therefore, the objective of this thesis is to construct an intelligent learning science that incorporates the merits and overcomes the limitations of the three paradigms. The applications considered for the proposed scheme are modeling and control. The generic topology of the system used in this thesis has a transparent structure; its parameters. links, signals, and modules have their own physical interpretations. Moreover, the learning scheme uses task decomposition to identify the systems' parameters. The leaming cask is decomposed into three subtasks (phases). The first phase performs a coarse identification of the systems' numerical parameters using an unsupervised learning (clustering) algorithm. The second phase finds the linguistic association parameters (linguistic rules) using unsupervised as well as supervised learning algorithms. In the thud phase, the numerical parameters are optimized and fine-tuned using supervised learning and search techniques. The performance of the theme is assessed by testing it on two benchmark modeling applications. The results are compared to those of other intelligent modeling approaches to show the performance characteristics of the proposed scheme. The scheme is also assessed by applying it to nonlinear control problems. The synchronous machine voltage regulation and speed stabilization problems have been tackled using the proposed scheme. Several comparative studies have been carried out to show the advantages of the proposed control approach over conventional approaches.

Traffic Signs Identification by Deep Learning for Autonomous Driving

Conference Paper

Full-text available

Jan 2018

Recognition of traffic signs by convolutional neural nets for self-driving vehicles

Article

Full-text available

Nov 2018

Wael A. Farag

In this paper, a comprehensive Convolutional Neural Network (CNN) based classifier "WAF-LeNet" is proposed and developed to be used in traffic signs recognition and identification as an empowerment of autonomous driving technologies. The implemented architecture is a deep fifteen-layer network that has been selected after extensive trials to be fast enough to suit the designated application. The CNN got trained using Adam's optimization algorithm as a variant of the Stochastic Gradient Descent (SGD) technique. The learning process is carried out using the well-known "German Traffic Sign Dataset - GTSRB". The data has been partitioned into training, validation and testing data sets. Additionally, more random traffic signs images are collected from the web and further used to test the robustness of the proposed CNN classifier. The paper goes through the development process in details and shows the image processing pipeline harnessed in the development. The proposed approach proved successful in identifying correctly 96.5% of the testing data set and 100% of the robustness data set with much smaller and faster network than other counterparts.

Deep neural network for traffic sign recognition systems: An analysis of spatial transformers and stochastic optimisation methods

Article

Full-text available

Jan 2018
NEURAL NETWORKS

This paper presents a Deep Learning approach for traffic sign recognition systems. Several classification experiments are conducted over publicly available traffic sign datasets from Germany and Belgium using a Deep Neural Network which comprises Convolutional layers and Spatial Transformer Networks. Such trials are built to measure the impact of diverse factors with the end goal of designing a Convolutional Neural Network that can improve the state-of-the-art of traffic sign classification task. First, different adaptive and non-adaptive stochastic gradient descent optimisation algorithms such as SGD, SGD-Nesterov, RMSprop and Adam are evaluated. Subsequently, multiple combinations of Spatial Transformer Networks placed at distinct positions within the main neural network are analysed. The recognition rate of the proposed Convolutional Neural Network reports an accuracy of 99.71% in the German Traffic Sign Recognition Benchmark, outperforming previous state-of-the-art methods and also being more efficient in terms of memory requirements.

CANTrack: Enhancing automotive CAN bus security using intuitive encryption algorithms

Conference Paper

Full-text available

Apr 2017

Wael A. Farag

End to End Learning for Self-Driving Cars

Article

Full-text available

Apr 2016

We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads. The system automatically learns internal representations of the necessary processing steps such as detecting useful road features with only the human steering angle as the training signal. We never explicitly trained it to detect, for example, the outline of roads. Compared to explicit decomposition of the problem, such as lane marking detection, path planning, and control, our end-to-end system optimizes all processing steps simultaneously. We argue that this will eventually lead to better performance and smaller systems. Better performance will result because the internal components self-optimize to maximize overall system performance, instead of optimizing human-selected intermediate criteria, e.g., lane detection. Such criteria understandably are selected for ease of human interpretation which doesn't automatically guarantee maximum system performance. Smaller networks are possible because the system learns to solve the problem with the minimal number of processing steps. We used an NVIDIA DevBox and Torch 7 for training and an NVIDIA DRIVE(TM) PX self-driving car computer also running Torch 7 for determining where to drive. The system operates at 30 frames per second (FPS).

AiroDiag: A sophisticated tool that diagnoses and updates vehicles software over air

Article

Full-text available

Mar 2012

This paper introduces a novel method for diagnosing embedded systems and updating embedded software installed on the electronics control units of vehicles through the Internet using client and server units. It also presents the communication protocols between the vehicle and the manufacturer for instant fault diagnosis and software update while ensuring security for both parties. AiroDiag ensures maximum vehicle efficiency for the driver and provides the manufacturer with up-to-date vehicle performance data, allowing enhanced future software deployment and minimum loss in case of vehicle recalls.

An overview of gradient descent optimization algorithms

Article

Sep 2016

Sebastian Ruder

Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.

Adam: A Method for Stochastic Optimization

Article

Dec 2014

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.

Real-Time Detection and Recognition of Road Traffic Signs

Article

Dec 2012

Behavior Cloning for Autonomous Driving using Convolutional Neural Networks

Figures

Recommended publications

DriverGym: Democratising Reinforcement Learning for Autonomous Driving

Efficient Deep Reinforcement Learning with Imitative Expert Priors for Autonomous Driving

Iterative Imitation Policy Improvement for Interactive Autonomous Driving

Prediction of Steering Angle for Autonomous Vehicles Using Pre-Trained Neural Network