Flow diagram of a QNN for object detection.

Flow diagram of a QNN for object detection.

Source publication
Article
Full-text available
With the recent growth of the Internet of Things (IoT) and the demand for faster computation, quantized neural networks (QNNs) or QNN-enabled IoT can offer better performance than conventional convolution neural networks (CNNs). With the aim of reducing memory access costs and increasing the computation efficiency, QNN-enabled devices are expected...

Context in source publication

Context 1
... suitable quantization architecture should be able to store useful information in continuous variables and is critical to network performances. Figure 7 describes the typical blocks of a QNN flowchart. ...

Citations

... However, GPUs are no longer an ideal computing platform for deploying neural networks, owing to their high-power consumption. In recent years, field-programmable gate arrays (FPGA) have attracted increasing attention from academia and industry concerning CNN accelerators [7,8] owing to the characteristics of many computing units and memory blocks. ...
Article
Full-text available
Speech recognition has progressed tremendously in the area of artificial intelligence (AI). However, the performance of the real-time offline Chinese speech recognition neural network accelerator for edge AI needs to be improved. This paper proposes a configurable convolutional neural network accelerator based on a lightweight speech recognition model, which can dramatically reduce hardware resource consumption while guaranteeing an acceptable error rate. For convolutional layers, the weights are binarized to reduce the number of model parameters and improve computational and storage efficiency. A multichannel shared computation (MCSC) architecture is proposed to maximize the reuse of weight and feature map data. The binary weight-sharing processing engine (PE) is designed to avoid limiting the number of multipliers. A custom instruction set is established according to the variable length of voice input to configure parameters for adapting to different network structures. Finally, the ping-pong storage method is used when the feature map is an input. We implemented this accelerator on Xilinx ZYNQ XC7Z035 under the working frequency of 150 MHz. The processing time for 2.24 s and 8 s of speech was 69.8 ms and 189.51 ms, respectively, and the convolution performance reached 35.66 GOPS/W. Compared with other computing platforms, accelerators perform better in terms of energy efficiency, power consumption and hardware resource consumption.