论文标题
FARSI:Facebook AR系统调查员敏捷域特异性芯片探索
FARSI: Facebook AR System Investigator for Agile Domain-Specific System-on-Chip Exploration
论文作者
论文摘要
域特异性SOC(DSSOC)是针对具有严格功能/性能/区域约束的域的有吸引力的解决方案;但是,他们遭受了两个基本复杂性。一方面,他们的许多专业硬件块导致了复杂的系统,从而高昂的开发工作。另一方面,他们的许多系统旋钮扩大了设计空间的复杂性,从而难以寻找最佳设计。因此,为了达到患病率,必须驯服这种复杂性。这项工作确定了针对DSSOCS复杂设计空间的早期设计空间探索(DSE)框架的必要特征,并进一步提供了一个称为Farsi,(f)Acebook(AR)(s)(s)ystem(i)Nvestigator的实例。具体来说,与Synopsys Platform Architect相比,FARSI提供了一个敏捷的系统级模拟器,其速度和准确性为8,400x和98.5%。 FARSI还提供了有效的探索启发式方法,与幼稚的模拟退火(SA)相比,最高可提高16倍的融合时间。这是通过通过建筑推理(例如地方开发和瓶颈放松)来增强SA来完成的。此外,我们嵌入了各种共同设计的功能,并表明它们对收敛率的影响32%。最后,我们证明,使用简单的开发成本感知策略可以将系统的复杂性降低,无论是在组件计数和变化方面,都可以降低多达150%和118%(e,g。
Domain-specific SoCs (DSSoCs) are attractive solutions for domains with stringent power/performance/area constraints; however, they suffer from two fundamental complexities. On the one hand, their many specialized hardware blocks result in complex systems and thus high development effort. On the other, their many system knobs expand the complexity of design space, making the search for the optimal design difficult. Thus to reach prevalence, taming such complexities is necessary. This work identifies necessary features of an early-stage design space exploration (DSE) framework that targets the complex design space of DSSoCs and further provides an instance of one called FARSI, (F)acebook (AR) (S)ystem (I)nvestigator. Concretely, FARSI provides an agile system-level simulator with speed up and accuracy of 8,400X and 98.5% comparing to Synopsys Platform Architect. FARSI also provides an efficient exploration heuristic and achieves up to 16X improvementin convergence time comparing to naive simulated annealing (SA). This is done by augmenting SA with architectural reasoning such as locality exploitation and bottleneck relaxation. Furthermore, we embed various co-design capabilities and show that on average, they have a 32% impact on the convergence rate. Finally, we demonstrate that using simple development-cost-aware policies can lower the system complexity, both in terms of the component count and variation by as much as 150% and 118% (e,g., for Network-on-a-Chip subsystem)