混合与匹配：一种新型以FPGA为中心的深神经网络量化框架

论文标题

混合与匹配：一种新型以FPGA为中心的深神经网络量化框架

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

论文作者

Chang, Sung-En, Li, Yanyu, Sun, Mengshu, Shi, Runbin, So, Hayden K. -H., Qian, Xuehai, Wang, Yanzhi, Lin, Xue

论文摘要

深度神经网络（DNN）在各种应用领域都取得了非凡的性能。为了支持各种DNN模型，对边缘计算平台（例如ASIC，FPGA和嵌入式系统）的有效实现进行了广泛的研究。由于模型大小和计算量，模型压缩是在边缘设备上部署DNN模型的关键步骤。本文重点介绍了重量量化，这是一种适用于重量修剪的硬件友好模型压缩方法。与对所有权重使用相同量化方案的现有方法不同，我们提出了第一个解决重量矩阵不同行的量化方案的解决方案。它是由（1）重量分布在不同行中的分布而不是相同的；（2）可以更好地利用异质FPGA硬件资源的潜力。为了实现这一目标，我们首先提出了一种适合高斯式重量分布的硬件友好量化方案，称为“量总和”，其中乘法算术可以用逻辑变速杆和加法器代替，从而可以通过FPGA LUT资源实现了高效的实现。相比之下，现有的固定点量化适用于均匀的重量分布，并且可以通过DSP有效地实现。然后，为了充分探索资源，我们提出了一个以FPGA为中心的混合方案量化（MSQ），并通过建议的SP2和定点方案的集合。组合两个方案可以维持甚至可以提高精度，因为与重量分布更好。

Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices. This paper focuses on weight quantization, a hardware-friendly model compression approach that is complementary to weight pruning. Unlike existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix. It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of heterogeneous FPGA hardware resources. To achieve that, we first propose a hardware-friendly quantization scheme named sum-of-power-of-2 (SP2) suitable for Gaussian-like weight distribution, in which the multiplication arithmetic can be replaced with logic shifter and adder, thereby enabling highly efficient implementations with the FPGA LUT resources. In contrast, the existing fixed-point quantization is suitable for Uniform-like weight distribution and can be implemented efficiently by DSP. Then to fully explore the resources, we propose an FPGA-centric mixed scheme quantization (MSQ) with an ensemble of the proposed SP2 and the fixed-point schemes. Combining the two schemes can maintain, or even increase accuracy due to better matching with weight distributions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题