平行调度自我注意机制：概括和优化

论文标题

平行调度自我注意机制：概括和优化

Parallel Scheduling Self-attention Mechanism: Generalization and Optimization

论文作者

Yu, Mingfei, Fujita, Masahiro

论文摘要

在过去的几年中，自我注意力在深度学习领域，尤其是自然语言处理（NLP）的领域。它令人印象的有效性以及无处不在的实现引起了我们对将相应计算的数据流安排在具有许多计算单元的体系结构上以实现并行计算的兴趣。在本文中，基于语言模型中自我注意力的自我注意机制和最新实现的理论，我们提出了一种一般的调度算法，该算法源自可满足性检查（SAT）求解器解决的小实例的最佳计划，以并行化自我意见的典型计算。还提出了用于跳过冗余计算的进一步优化的策略，对于两种广泛的自我注意力应用程序方案，分别降低了近25％和50％的原始计算。通过采用了建议的优化，我们相应地提出了另外两种调度算法。所提出的算法无论问题尺寸如何，只要输入向量的数量可以除以体系结构中可用的计算单元数量。由于在数学上证明算法的正确性的复杂性对于一般情况，我们进行了实验以揭示其有效性，以及通过解决特定情况下解决SAT问题提供的解决方案的卓越质量。

Over the past few years, self-attention is shining in the field of deep learning, especially in the domain of natural language processing(NLP). Its impressive effectiveness, along with ubiquitous implementations, have aroused our interest in efficiently scheduling the data-flow of corresponding computations onto architectures with many computing units to realize parallel computing. In this paper, based on the theory of self-attention mechanism and state-of-the-art realization of self-attention in language models, we propose a general scheduling algorithm, which is derived from the optimum scheduling for small instances solved by a satisfiability checking(SAT) solver, to parallelize typical computations of self-attention. Strategies for further optimization on skipping redundant computations are put forward as well, with which reductions of almost 25% and 50% of the original computations are respectively achieved for two widely-adopted application schemes of self-attention. With the proposed optimization adopted, we have correspondingly come up with another two scheduling algorithms. The proposed algorithms are applicable regardless of problem sizes, as long as the number of input vectors is divisible to the number of computing units available in the architecture. Due to the complexity of proving the correctness of the algorithms mathematically for general cases, we have conducted experiments to reveal their validity, together with the superior quality of the solutions provided by which, by solving SAT problems for particular instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题