Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs
2017
Convolutional neural networks (CNNs) have been widely applied in many deep learning applications. In recent years, the FPGA implementation for CNNs has attracted much attention because of its high performance and energy efficiency. However, existing implementations have difficulty to fully leverage the computation power of the latest FPGAs. In this paper we implement CNN on an FPGA using a
systolic arrayarchitecture, which can achieve high clock frequency under high resource utilization. We provide an analytical model for performance and resource utilization and develop an automatic
design space explorationframework, as well as source-to-source code transformation from a C program to a CNN implementation using
systolic array. The experimental results show that our framework is able to generate the accelerator for real-life CNN models, achieving up to 461 GFlops for
floating pointdata type and 1.2 Tops for 8-
16 bitfixed point.
Keywords:
- Convolutional neural network
- Deep learning
- Parallel computing
- Electronic engineering
- Convolutional code
- Field-programmable gate array
- Design space exploration
- Computer science
- Architecture
- Clock rate
- Systolic array
- Artificial intelligence
- Theoretical computer science
- Spiking neural network
- Floating point
-
Correction
-
Source
-
Cite
-
Save
24
References
261
Citations
NaN
KQI