Conference Paper

Fast parking control of mobile robot based on multi-layer neural network on homogeneous architecture

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Today, the problem of designing suitable multiprocessor architecture tailored for a target Neural Networks applications raises the need for a fast and efficient MP-SOC (MultiProcessor System-on-Chip) design environment. Additionally, the implementation of such applications on multiprocessor designs will need to exploit the parallelism and pipelining in algorithms with the hope of delivering significant reduction in execution times. To take advantage of parallelization on homogeneous multiprocessor architecture and to reduce the programming effort, we provide new MP-SOC design methodology which offers more opportunities for accelerating the parallelization of Neural Networks algorithms. The efficiency of this approach is tested on many examples of applications. This work is devoted to the design and implementation of a complete intelligent controller parking system of autonomous mobile robot based on Multi-Layer Feed-Forward Neural Networks. To emphasize some specific requirements to be considered when implementing such algorithm, we propose new parallel pipelined architecture composed of several computational stages. Additionally, we especially suggest a parallel software skeleton “SCComCM” aimed at being employed by the developed multistage architecture. The experimental results show that the proposed parallel architecture has better speed-up, less communication time, and better space reduction factor than the hand tuned hardware design.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Moreover, the processes of distributing the input data and gathering the output data need to be included in the skeleton's definition, done by the same processing node. Some other skeletons are derived from this skeleton such as SCComCM (Split, Compute, Communication all-to-all,Compute, Merge) and SCComMC (Split, Compute, Communication all-to-all, Merge, Compute) which are introduced to the parallel implementation of multilayer neural network [13] and dynamic neural field (DNF), respectively. Nevertheless, some image processing applications require irregular data set processing, for instance, an arbitrary list of different sizes of windows changing at each iteration (image). ...
Article
Full-text available
Today, the problem of designing suitable multiprocessor architecture tailored for a target application field raises the need for a fast and efficient multiprocessor system-on-chip (MPSoC) design environment. Additionally, the implementation of image processing applications on MPSoC system will need to exploit the parallelism and the pipelining in algorithms with the hope of delivering significant reduction in execution times. To take advantage of parallelization on homogeneous MPSoCs and to reduce the programming effort, the proposed design methodology offers more opportunities for accelerating the parallelization of sequential processing image algorithms on pipeline architecture. Our approach provides rapid prototyping tool as a graphic programming environment (CubeGen). Further, it offers a set of parallel software skeletons as a communication library, providing a software abstraction to enable quick implementation of complex image processing applications on field-programmable gate array (FPGA) platform. The design of homogeneous network of communicating processor is presented from the hardware and software specification down to synthesizable hardware description. Then, we extend our approach to support more complex applications by implementing a soft multiprocessor for 'multihypotheses model-driven approach for road recognition' and show the impact of various configuration choices (hardware and software) to match the specific application needs. Using the images of a real road scene, the performance results of the road recognition algorithm on a Xilinx Virtex-6 FPGA platform not only achieve the desired latency but also further improve the tracking performance which depends mainly on the number of hypotheses.
Conference Paper
Full-text available
This article discusses the design of an application specific MP-SoC (Multi-Processors System on Chip) architecture dedicated to face tracking algorithm. The proposed algorithm tracks a Region-Of-Interest (ROI) by determining the similarity measures between the reference and the target frames. In our approach, this measure is the estimation of the Kullback-Leibler divergence from the K-nearest neighbor (KNN) framework. The metric between pixels is an Euclidean norm in a joint geometric and radiometric space. The adopted measure allows us to check if the regions have similar colors and also if these colors appear at the same location. Considering the necessary computation amounts, we propose a parallel hardware implementation of the developed algorithm on MP-SoC architecture. Creating multiple processors in one system is hard for software developers using traditional hardware design approaches due to the complexity to design software models suitable for such FPGA implementations. In order to deal with this problem, we have introduced a CubeGen tool to avoid fastidious manual editing operations for the designer. This new methodology enables us to instantiate a generic Homogeneous Network of Communicating Processors (called HNCP) tailored for our targeted application. Our implementations are demonstrated using the Xilinx FPGA chip XC6VLX240T.
Article
Full-text available
There is renewed interest in computational intelligence, due to advances in algorithms, neuroscience, and computer hardware. In addition there is enormous interest in autonomous vehicles (air, ground, and sea) and robotics, which need significant onboard intelligence. Work in this area could not only lead to better understanding of the human brain but also very useful engineering applications. The functioning of the human brain is not well understood, but enormous progress has been made in understanding it and, in particular, the neocortex. There are many reasons to develop models of the brain. Artificial Neural Networks (ANN), one type of model, can be very effective for pattern recognition, function approximation, scientific classification, control, and the analysis of time series data. ANNs often use the back-propagation algorithm for training, and can require large training times especially for large networks, but there are many other types of ANNs. Once the network is trained for a particular problem, however, it can produce results in a very short time. Parallelization of ANNs could drastically reduce the training time. An object-oriented, massively-parallel ANN (Artificial Neural Network) software package SPANN (Scalable Parallel Artificial Neural Network) has been developed and is described here. MPI was used to parallelize the C++ code. Only the neurons on tlie edges of the domains were involved in communication, in order to reduce the communication costs and maintain scalability. The back-propagation algorithm was used to train the network. In preliminary tests, the software was used to identify character sets. The code correctly identified all the characters wlien adequate training was used in the network. The code was run on up to 500 Intel Itanium processors with 25,000 neurons and more than 2 billion neuron weights. Various comparisons in training time, forward propagation time, and error reduction were also made.
Article
Full-text available
This article presents a comprehensive overview of the hardware realizations of artificial neural network (ANN) models, known as hardware neural networks (HNN), appearing in academic studies as prototypes as well as in commercial use. HNN research has witnessed a steady progress for more than last two decades, though commercial adoption of the technology has been relatively slower. We study the overall progress in the field across all major ANN models, hardware design approaches, and applications. We outline underlying design approaches for mapping an ANN model onto a compact, reliable, and energy efficient hardware entailing computation and communication and survey a wide range of illustrative examples. Chip design approaches (digital, analog, hybrid, and FPGA based) at neuronal level and as neurochips realizing complete ANN models are studied. We specifically discuss, in detail, neuromorphic designs including spiking neural network hardware, cellular neural network implementations, reconfigurable FPGA based implementations, in particular, for stochastic ANN models, and optical implementations. Parallel digital implementations employing bit-slice, systolic, and SIMD architectures, implementations for associative neural memories, and RAM based implementations are also outlined. We trace the recent trends and explore potential future research directions.
Article
Full-text available
In this paper we present a platform for evolving spiking neural networks on FPGAs. Embedded intelligent applications require both high performance, so as to exhibit real-time behavior, and flexibility, to cope with the adaptivity requirements. While hardware solutions offer performance, and software solutions offer flexibility, reconfigurable computing arises between these two types of solutions providing a trade-off between flexibility and performance. Our platform is described as a combination of three parts: a hardware substrate, a computing engine, and an adaptation mechanism. We present, also, results about the performance and synthesis of the neural network implementation on an FPGA.
Conference Paper
Full-text available
Models, architectures and languages for parallel computation have been of utmost research interest in computer science and engineering for several decades. A great variety of parallel computation models has been proposed and studied, and different parallel and distributed architectures designed as some possible ways of harnessing parallelism and improving performance of the general purpose computers.Massively parallel connectionist models such as artificial neural networks (ANNs) and cellular automata (CA) have been primarily studied in domain-specific contexts, namely, learning and complex dynamics, respectively. However, they can also be viewed as generic abstract models of massively parallel computers that are in many respects fundamentally different from the "main stream" parallel and distributed computation models.We compare and contrast herewith the parallel computers as they have been built by the engineers with those built by Nature. We subsequently venture onto a high-level discussion of the properties and potential advantages of the proposed massively parallel computers of the future that would be based on the fine-grained connectionist parallel models, rather than on either various multiprocessor architectures, or networked distributed systems, which are the two main architecture paradigms in building parallel computers of the late 20th and early 21st centuries. The comparisons and contrasts herein are focusing on the fundamental conceptual characteristics of various models rather than any particular engineering idiosyncrasies, and are carried out at both structural and functional levels. The fundamental distinctions between the fine-grain connectionist parallel models and their "classical" coarse-grain counterparts are discussed, and some important expected advantages of the hypothetical massively parallel computers based on the connectionist paradigms conjectured.We conclude with some brief remarks on the role that the paradigms, concepts, and design ideas originating from the connectionist models have already had in the existing parallel design, and what further role the connectionist models may have in the foreseeable future of parallel and distributed computing.
Article
Nowadays, the techniques based on the use of artificial neural networks are instigating increasing interest in the fields of control and robotics. The rapidity of processing, the ability to learn and adapt as well as the robustness of these approaches, are motivating this work. To help this system be embedded in a wheelchair, it is imperative to respect the functional constraints and those of resource allocation, weights, consumption, cost… So conceiving an embedded system is ultimately an exercise in optimization: minimizing production costs for optimal functionality. The objective of this work is FPGA implementation of an optimal architecture of neuronal network.
Article
We present a technique for parallelizing the training of neu-ral networks. Our technique is designed for paralleliza-tion on a cluster of workstations. To take advantage of parallelization on clusters, a solution must account for the higher network latencies and lower bandwidths of clusters as compared to custom parallel architectures. Paralleliza-tion approaches that may work well on special purpose par-allel hardware, such as distributing the neurons of the neu-ral network across processors, are not likely to work well on cluster systems because communication costs to process a single training pattern are too prohibitive. Our solution, Pattern Parallel Training, duplicates the full neural network at each cluster node. Each cooperating process in the clus-ter trains the neural network on a subset of the training set each epoch. We demonstrate the effectiveness of our ap-proach by implementing and testing an MPI version of Pat-tern Parallel Training for the eight bit parity problem. Our results show a significant speed-up in training time as com-pared to sequential training. In addition, we analyze the communication costs of our technique and discuss which types of common neural network problems would benefit most from our approach.
Article
We here present a parallel implementation of art neural networks on the connection machine CM-5 and compare it with other parallel implementations on SIMD and MIMD architectures. This parallel implementation was developed with the goal of efficiently training large neural networks with huge training pattern sets for applications in molecular biology, in particular the prediction of coding regions in DNA sequences. The implementation uses training pattern parallelism and makes use of the parallel I/O facilities of the CM-5 and its efficient reduction operations available within the control network to achieve a high scalability. The parallel simulator obtains a maximum speed of 149.25 MCUPS for training feed-forward networks with backpropagation on a 512 processor CM-5 system without using the CM-5 vector facility. The implementation poses no restriction on the type of network topology and works with different batch training algorithms like BP, Quickprop and Rprop.
Article
This paper presents a hardware implementation of multilayer feedforward neural networks (NN) using reconfigurable field-programmable gate arrays (FPGAs). Despite improvements in FPGA densities, the numerous multipliers in an NN limit the size of the network that can be implemented using a single FPGA, thus making NN applications not viable commercially. The proposed implementation is aimed at reducing resource requirement, without much compromise on the speed, so that a larger NN can be realized on a single chip at a lower cost. The sequential processing of the layers in an NN has been exploited in this paper to implement large NNs using a method of layer multiplexing. Instead of realizing a complete network, only the single largest layer is implemented. The same layer behaves as different layers with the help of a control block. The control block ensures proper functioning by assigning the appropriate inputs, weights, biases, and excitation function of the layer that is currently being computed. Multilayer networks have been implemented using Xilinx FPGA "XCV400hq240". The concept used is shown to be very effective in reducing resource requirements at the cost of a moderate overhead on speed. This implementation is proposed to make NN applications viable in terms of cost and speed for online applications. An NN-based flux estimator is implemented in FPGA and the results obtained are presented.
Article
This work presents an efficient mapping scheme for the multilayer perceptron (MLP) network trained using back-propagation (BP) algorithm on network of workstations (NOWs). Hybrid partitioning (HP) scheme is used to partition the network and each partition is mapped on to processors in NOWs. We derive the processing time and memory space required to implement the parallel BP algorithm in NOWs. The performance parameters like speed-up and space reduction factor are evaluated for the HP scheme and it is compared with earlier work involving vertical partitioning (VP) scheme for mapping the MLP on NOWs. The performance of the HP scheme is evaluated by solving optical character recognition (OCR) problem in a network of ALPHA machines. The analytical and experimental performance shows that the proposed parallel algorithm has better speed-up, less communication time, and better space reduction factor than the earlier algorithm. This work also presents a simple and efficient static mapping scheme on heterogeneous system. Using divisible load scheduling theory, a closed-form expression for number of neurons assigned to each processor in the NOW is obtained. Analytical and experimental results for static mapping problem on NOWs are also presented.