Fig 1 - uploaded by Abhisek Ukil
Content may be subject to copyright.
Multi-layer Perceptron model.

Multi-layer Perceptron model.

Source publication
Article
Full-text available
This paper presents the development and implementation of a generalized backpropagation multilayer perceptron (MLP) architecture described in VLSI hardware description language (VHDL). The development of hardware platforms has been complicated by the high hardware cost and quantity of the arithmetic operations required in online artificial neural n...

Context in source publication

Context 1
... perceptrons (MLPs) (Fig. 1) are layered fully connected feed-forward networks. That is, all PEs (Fig. 2) in two consecutive layers are connected to one another in the forward ...

Similar publications

Article
Full-text available
This paper presents a hardware implementation of neural networks into reprogrammable devices, FPGAs. It was used for this implementation the LIRMM board, which allows to develop both hardware and software designs, using C and VHDL languages, respectively. A feedforward neural network was implemented, using backpropagation learning. Emphasis is give...

Citations

... The use of EDA tools and techniques for electronic systems design facilitates easy development and hardware implementation of intelligent controllers employing complex algorithms, leading to AI-based designs for industrial applications [36][37][38][39][40]. Their quick route to hardware and the flexibility offered, especially through embedded system-on-chips, enable hardware/software co-design and implementation [41]. ...
Article
Full-text available
This paper reviews the evolution of methodologies and tools for modeling, simulation, and design of digital electronic system-on-chip (SoC) implementations, with a focus on industrial electronics applications. Key technological, economic, and geopolitical trends are presented at the outset, before reviewing SoC design methodologies and tools. The fundamentals of SoC design flows are laid out. The paper then exposes the crucial role of the intellectual property (IP) industry in the relentless improvements in performance, power, area, and cost (PPAC) attributes of SoCs. High abstraction levels in design capture and increasingly automated design tools (e.g., for verification and validation, synthesis, place, and route) continue to push the boundaries. Aerospace and automotive domains are included as brief case studies. This paper also presents current and future trends in SoC design and implementation including the rising, evolution, and usage of machine learning (ML) and artificial intelligence (AI) algorithms, techniques, and tools, which promise even greater PPAC optimizations.
... Nevertheless, the number of layers in this work is not adjustable and the configuration steps are complex. To overcome these limitations, a more flexible FPGA implementation method for neural networks was proposed in [25]. This highly parameterized method made and the convenience it offers for parameterization. ...
... The design of the activation function is our previous work, so it will be only briefly introduced here. The activation function is realized through a hybrid method, the fast-changing region of which is approximated by the method of the lookup table with interpolation [25], and the slow-changing region of which is realized using the range addressable lookup table method [37]. The optimal data bit-width of the method is selected automatically according to the expected accuracy. ...
Article
Full-text available
An efficient on-chip learning method based on neuron multiplexing is proposed in this paper to address the limitations of traditional on-chip learning methods, including low resource utilization and non-tunable parallelism. The proposed method utilizes a configurable neuron calculation unit (NCU) to calculate neural networks in different degrees of parallelism through multiplexing NCUs at different levels, and resource utilization can be increased by reducing the number of NCUs since the resource consumption is predominantly determined by the number of NCUs and the data bit-width, which are decoupled from the specific topology. To better support the proposed method and minimize RAM block usage, a weight segmentation and recombination method is introduced, accompanied by a detailed explanation of the access order. Moreover, a performance model is developed to facilitate parameter selection process. Experimental results conducted on an FPGA development board demonstrate that the proposed method has lower resource consumption, higher resource utilization, and greater generality compared to other methods.
... Some algorithms are developed for FPGA-based onsite training. Back-Propagation (BP) [8][9][10], is implemented on FPGA for neural network (NN) online training with better performance, while BP is essentially the gradient descent method with the slow convergence speed. The implementation of Quasi-Newton (QN) algorithms for NN training of a deeply pipelined design has a higher performance of 17ˆover CPU [11]. ...
Conference Paper
Conjugate gradient (CG) is widely used in training sparse neural networks. However, CG, involving a large amount of sparse matrix and vector operations, cannot be efficiently implemented on resource-limited edge devices. In this paper, a high-performance and energy-efficient CG accelerator implemented on edge Field Programmable Gate Array is proposed for fast onsite neural networks training. According to the profiling, we propose a unified matrix multiplier that is compatible with the sparse and dense matrix. We also design a novel T-engine to handle transpose operation with the compressed sparse format. Experimental results show that our proposal outperforms the state-of-the-art FPGA work with a resource reduction of up to 41.3%. In addition, we achieve on average 10.2× and 2.0× speedup, while 10.1× and 3.5× better energy efficiency than implementations on CPU and GPU, respectively.
... Tanh function is defined in equation (1) as: (1) Physical implementation of equation (1) has two difficulties, exponential terms in the equation and division. There are many approaches to deal with problem associated with exponential approximation. ...
... There is piecewise linear approximation [1], [2], [4]. Look up tables (LUTs) and range addressable look up tables (RALUTs) [3], [5]. ...
... To reduce cost of implementation conversion to power of two was recommended in [1] as shown in equation (4) Two approximations of equation (4) and Table (1) shows the error of approximated Tanh with power of two as a function of the word lengths. Implementation of the approximated Tanh with power of two has been carried out on Xilinx Vivado 2019 and simulated with Modelsim -Mentor edition and the simulation waveform editor included Modelsim altrn. ...
Conference Paper
Hyperbolic tangent (Tanh) activation function is used in multilayered artificial neural networks (ANN). This activation function contains exponential and division terms in its expressions which makes its accurate digital implementation difficult. In this paper we present two different approximation techniques for digital implementation of Tanh function using power of two and coordinate rotation digital computer (CORDIC) methods. A comparative study of both techniques in terms of accuracy of their approximations in hardware costs as well as their speed when implemented on FPGA is also explained
... Moreover, the use of higher data precision may result in increased utilization of resources considering that more flip-flops and logic gates are required to store and manipulate larger data representation. The nonlinear activation function (sigmoid or hyperbolic tangent), in each neuron, cannot directly be implemented in hardware since both division and exponential operations require a lot of time and a large amount of material resources [31]. The remaining practical hardware approach is to approximate the function [32,33]. ...
Article
This paper proposes a serial hardware architecture of a multilayer perceptron (MLP) neural network for real-time wheezing detection in respiratory sounds. As an established classification tool, the MLP has proven its ability to identify complex patterns within respiratory sounds. The proposed fully serial architecture uses a single calculation unit, independently of the number of neurons in the MLP network. It is also a fully scalable architecture that permits to implement MLP networks, of any size, easily and efficiently without modifying the design or wiring. The proposed serial architecture has been implemented on a low-cost and power-efficient field programmable gate array (FPGA) chip using a high-level programming tool. The respiratory sounds classification rates are evaluated in terms of sensitivity, specificity, performance, and accuracy. The proposed serial architecture reaches the same classification performances as the parallel one, but it presents the main advantage of using much fewer hardware resources.
... The backpropagation multilayer perceptron (MLP) was proposed to design based on a very large scale of integration (VLSI) parameters and FPGA [13] using a very high speed integrated circuit (VHSIC) -hardware description language (VHDL) to amylase the chip performance. The Spiking neural network (SNN) [23] was designed for targeting 64 K neurons on FPGA for hardware accelerator. The performance of the neural network was enhanced using the concept of parallelization [15] applied in both the time and space domains. ...
Article
Full-text available
An artificial neural network (ANN) is a computational system that is designed to replicate and process the behavior of the human brain using neuron nodes. ANNs are made up of thousands of processing neurons with input and output modules that self-learn and compute data to offer the best results. The hardware realization of the massive neuron system is a difficult task. The research article emphasizes the design and realization of multiple input perceptron chips in Xilinx integrated system environment (ISE) 14.7 software. The proposed single-layer ANN architecture is scalable and accepts variable 64 inputs. The design is distributed in eight parallel blocks of ANN in which one block consists of eight neurons. The performance of the chip is analyzed based on the hardware utilization, memory, combinational delay, and different processing elements with targeted hardware Virtex-5 field-programmable gate array (FPGA). The chip simulation is performed in Modelsim 10.0 software. Artificial intelligence has a wide range of applications, and cutting-edge computing technology has a vast market. Hardware processors that are fast, affordable, and suited for ANN applications and accelerators are being developed by the industries. The novelty of the work is that it provides a parallel and scalable design platform on FPGA for fast switching, which is the current need in the forthcoming neuromorphic hardware.
... Through an increase of computation power and number of collected data, ANNs have managed to undergo rapid development in recent years. There are many applications of ANNs found in areas including pattern recognition, image processing, speech recognition, control systems, predictions, etc. [1]- [3]. Depending on the specific architecture, ANNs may consist of large number of neurons clustered into layers. ...
... A look-up table (LUT) with 8192 elements has been exercised for the hyperbolic tangent function implementation in CompactRIO hardware platform [10]. A similar technique -a LUT with linear interpolation between LUT's points has been applied in [1]. A Taylor series approximation of the hyperbolic tangent function is considered in [11]- [12]. ...
... Accuracy LUT with linear approximation; fixed-point [1] 1.60E − 7 PWL; floating-point [3] 7.40E − 3 PWL; floating-point [4] 2.18E − 5 Linear approximation; fixed-point [8] 1.92E − 4 CORDIC; fixed-point [5] 1.70E − 7 CORDIC; floating-point [13] 2.47E − 7 DCT interpolation; double-precision floating-point [14] 1.00E − 5 Direct with McLaurin series; floating-point [19] 1.79E − 7 Proposed, direct polynomial approximation; floating-point 5.89E − 8 Proposed, Chebyshev approximation; fixed-point 5.59E − 11 ...
Article
Full-text available
The paper presents in detail a relatively simple implementation method of the hyperbolic tangent function, particularly targeted for FPGAs. The research goal of the proposed method was to examine the usage of the approximation of ordinary or Chebyshev polynomials for the implementation of the function. Several miscellaneous implementation versions have been considered. They differ in the polynomial degree, number of intervals for which the domain of the function is divided, etc. Both floating-point and fixed-point implementations have been presented. An impact on the FPGA resources utilization and calculations time for the implementation versions has also been briefly analyzed. Special attention has been paid to the accuracy of the calculations of the function. It turned out that applying the proposed method, a very high calculations accuracy can be achieved, while simultaneously maintaining reasonable resources utilization and short calculations time. The proposed method can be an effective alternative to other encountered implementation methods such as CORDIC. Additionally, the presented hardware architecture is more versatile and can be easily adapted for the implementation of other mathematical functions.
... The implementations of neuron models mainly include FPGA-based digital methods and analog circuit methods [21]- [24]. The FPGA platform is constituted by logic cell arrays and storage cells and can be reprogrammed flexibly via hardware description language (HDL) [25], [26]. It has been widely applied in neural computing analysis for the short development time, high speed, high efficiency, and strong anti-interference ability [21], [27]. ...
Article
Multipliers are essential in implementing nonlinear neuron models, but they take huge implementation costs. Many multiplierless fitting schemes have been proposed to simplify the implementation of nonlinearities in neuron models. To optimize these schemes, this paper presents a nullcline-characteristics-based piecewise linear (NC-PWL) fitting scheme for multiplierless implementations of Hindmarsh-Rose (HR) neuron model. This NC-PWL fitting scheme uses as few line segments as possible to approximate the critical nonlinearity characteristics of the local nullclines. A NC-PWL HR neuron model that reproduces diverse firing patterns of the original one is successfully established. Using off-the-shelf low-cost components, an analog multiplier-less circuit is designed for this fitting model and welded on print circuit board (PCB). Meanwhile, by logical shift method, a digital multiplierless circuit with low resource consumption is developed for this fitting model on field-programmable gate array (FPGA) platform. Experimental results of the analog and digital multiplierless hardware implementations verify the numerical simulations and show the simplicity and feasibility of the presented fitting scheme. Index Terms-Analog circuit, digital circuit, multiplierless implementation, neuron dynamics, Hindmarsh-Rose (HR) neuron model, piecewise linear (PWL) fitting scheme.
... Since the weights are fixed in [29], the multipliers can be optimized from general to the specific categories. The back propagation-based MLP architecture is shown in [13], where three MACs are used in each neuron. In [18], serial and parallel architectures of MLP are shown. ...
Article
Full-text available
Artificial neural network (ANN) is widely used in modern engineering applications. The decision on the number of layers and the number of nodes per layer in the ANN or the capacity of the ANN is always a non-trivial. The wrong decision on the capacity of the ANN causes underfit or overfit. This paper proposes various versatile or flexible hardware architectures of multilayer perceptron (MLP)-based neural network, where the number of layers and the number of nodes per layer can be changed with respect to the requirement of the application by avoiding underfit or overfit. Also, the weights of the each node of the MLP can be fixed by the training phase. While the network has being trained, we can change the architecture of the MLP without affecting the accuracy. All the proposed and existing hardware designs of ANNs are implemented with 45 nm CMOS technology. The proposed high throughput design with 3 layers and 512 nodes per layer achieves 53.8%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$53.8\%$$\end{document} of improvement in the throughput as compared with the existing technique.
... Redes Neurais Artificiais (RNAs) vêm sendo utilizadas na literatura para resolver problemas nas mais diferentes áreas de conhecimento [Mar et al. 2011] [Gomperts et al. 2011] [Patra and Chua 2010] [Achkar and Nasr 2010] [Harun et al. 2010] [Ferreira and Ludermir, 2009]. O sucesso desse tipo algoritmo em aplicações motivou o desenvolvimento crescente de novos modelos com diferentes níveis de complexidade. ...
Preprint
The optimization of Artificial Neural Networks (ANNs) is an important task to the success of using these models in real-world applications. The solutions adopted to this task are expensive in general, involving trial-and-error procedures or expert knowledge which are not always available. In this work, we investigated the use of meta-learning to the optimization of ANNs. Meta-learning is a research field aiming to automatically acquiring knowledge which relates features of the learning problems to the performance of the learning algorithms. The meta-learning techniques were originally proposed and evaluated to the algorithm selection problem and after to the optimization of parameters for Support Vector Machines. However, meta-learning can be adopted as a more general strategy to optimize ANN parameters, which motivates new efforts in this research direction. In the current work, we performed a case study using meta-learning to choose the number of hidden nodes for MLP networks, which is an important parameter to be defined aiming a good networks performance. In our work, we generated a base of meta-examples associated to 93 regression problems. Each meta-example was generated from a regression problem and stored: 16 features describing the problem (e.g., number of attributes and correlation among the problem attributes) and the best number of nodes for this problem, empirically chosen from a range of possible values. This set of meta-examples was given as input to a meta-learner which was able to predict the best number of nodes for new problems based on their features. The experiments performed in this case study revealed satisfactory results.