基于铁电的基于FET的上下文开关FPGA为自适应深度学习机器提供动态重新配置

论文标题

基于铁电的基于FET的上下文开关FPGA为自适应深度学习机器提供动态重新配置

Ferroelectric FET based Context-Switching FPGA Enabling Dynamic Reconfiguration for Adaptive Deep Learning Machines

论文作者

Xu, Yixin, Zhao, Zijian, Xiao, Yi, Yu, Tongguang, Mulaosmanovic, Halid, Kleimaier, Dominik, Duenkel, Stefan, Beyer, Sven, Gong, Xiao, Joshi, Rajiv, Hu, X. Sharon, Wen, Shixian, Rios, Amanda Sofie, Lekkala, Kiran, Itti, Laurent, Homan, Eric, George, Sumitha, Narayanan, Vijaykrishnan, Ni, Kai

论文摘要

现场可编程栅极阵列（FPGA）由于其重新配置，灵活性和快速的上市时间而广泛用于加速深度学习应用程序。但是，常规的FPGA遭受了芯片区域和重新配置延迟之间的权衡，这使得有效的FPGA加速度需要在多种配置之间切换仍然难以捉摸的加速度。在本文中，我们执行技术循环结构共同设计，以打破此权衡，而与传统设计相比，没有额外的区域成本和较低的功耗，同时提供动态重新配置，这可以隐藏执行时间背后的重新配置时间。提出并通过实验验证了紧凑型FPGA原始剂（FEFET）的固有晶体管结构和非挥发性，包括1fefet查找表（LUT）细胞，1Fefet路由电池，用于连接块（CBS）和开关框（SBS）的1Fefet路由电池。为了支持动态重新配置，并行放置了两个原始副本的本地副本，这可以加载任意配置而不会中断活动配置执行。一项全面的评估表明，与基于SRAM的FPGA相比，我们的动态重新配置设计显示，LUT/CB面积减少了63.0％/71.1％，CB/SB功耗减少了82.7％/53.6％，关键路径延迟（9.6％）的CB/SB功耗减少，最小的损害损失。我们进一步实施了一个超级启动网络模型，以显示我们设计的上下文转换功能的好处。在各种应用程序方案中，我们还评估了设计对常规FPGA的时机性能。在一种情况下，用户在两种预加载配置之间进行切换，我们的设计平均可以节省78.7％。在其他情况下，通过动态重新配置实施多种配置，我们的设计平均节省了20.3％的时间。

Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perform technology-circuit-architecture co-design to break this tradeoff with no additional area cost and lower power consumption compared with conventional designs while providing dynamic reconfiguration, which can hide the reconfiguration time behind the execution time. Leveraging the intrinsic transistor structure and non-volatility of ferroelectric FET (FeFET), compact FPGA primitives are proposed and experimentally verified, including 1FeFET look-up table (LUT) cell, 1FeFET routing cell for connection blocks (CBs) and switch boxes (SBs). To support dynamic reconfiguration, two local copies of primitives are placed in parallel, which enables loading of arbitrary configuration without interrupting the active configuration execution. A comprehensive evaluation shows that compared with the SRAM-based FPGA, our dynamic reconfiguration design shows 63.0%/71.1% reduction in LUT/CB area and 82.7%/53.6% reduction in CB/SB power consumption with minimal penalty in the critical path delay (9.6%). We further implement a Super-Sub network model to show the benefit from the context-switching capability of our design. We also evaluate the timing performance of our design over conventional FPGA in various application scenarios. In one scenario that users switch between two preloaded configurations, our design yields significant time saving by 78.7% on average. In the other scenario of implementing multiple configurations with dynamic reconfiguration, our design offers time saving of 20.3% on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题