论文标题

VAQF:低位视觉变压器的全自动软件硬件共同设计框架

VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer

论文作者

Sun, Mengshu, Ma, Haoyu, Kang, Guoliang, Jiang, Yifan, Chen, Tianlong, Ma, Xiaolong, Wang, Zhangyang, Wang, Yanzhi

论文摘要

带有注意机制的变压器体系结构在自然语言处理(NLP)方面取得了成功,而视觉变压器(VIT)最近将应用域扩展到了各种视觉任务。在达到高性能的同时,VIT遭受了较大的模型大小和高计算复杂性,这阻碍了它们在边缘设备上的部署。为了在硬件上实现高吞吐量并同时保留模型精度,我们提出了VAQF,该框架在FPGA平台上构建推理加速器,以量化具有二进制重量和低精度激活的量化VIT。给定模型结构和所需的帧速率,VAQF将自动输出激活所需的量化精度以及满足硬件要求的加速器的优化参数设置。该实现是通过Xilinx ZCU102 FPGA板上的Vivado高级合成(HLS)开发的,并且具有DEIT基数模型的评估结果表明,每秒24帧(FPS)的帧速率需求通过8位激活量化满足,并且具有30 fps的目标,并且具有6 fps的目标。据我们所知,这是第一次将量化纳入FPGA上的VIT加速度,并在一个全自动框架的帮助下,以指导软件端的量化策略,并在硬件方面的ACCELERATER实现,鉴于目标帧速率。与量化培训相比,要产生的汇编时间成本非常小,而产生的加速器显示了FPGA上最先进的VIT模型实时执行的能力。

The transformer architectures with attention mechanisms have obtained success in Nature Language Processing (NLP), and Vision Transformers (ViTs) have recently extended the application domains to various vision tasks. While achieving high performance, ViTs suffer from large model size and high computation complexity that hinders the deployment of them on edge devices. To achieve high throughput on hardware and preserve the model accuracy simultaneously, we propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized ViTs with binary weights and low-precision activations. Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations as well as the optimized parameter settings of the accelerator that fulfill the hardware requirements. The implementations are developed with Vivado High-Level Synthesis (HLS) on the Xilinx ZCU102 FPGA board, and the evaluation results with the DeiT-base model indicate that a frame rate requirement of 24 frames per second (FPS) is satisfied with 8-bit activation quantization, and a target of 30 FPS is met with 6-bit activation quantization. To the best of our knowledge, this is the first time quantization has been incorporated into ViT acceleration on FPGAs with the help of a fully automatic framework to guide the quantization strategy on the software side and the accelerator implementations on the hardware side given the target frame rate. Very small compilation time cost is incurred compared with quantization training, and the generated accelerators show the capability of achieving real-time execution for state-of-the-art ViT models on FPGAs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源