Dreamshard：推荐系统的可推广的嵌入式桌子放置

论文标题

Dreamshard：推荐系统的可推广的嵌入式桌子放置

DreamShard: Generalizable Embedding Table Placement for Recommender Systems

论文作者

Zha, Daochen, Feng, Louis, Tan, Qiaoyu, Liu, Zirui, Lai, Kwei-Herng, Bhushanam, Bhargav, Tian, Yuandong, Kejariwal, Arun, Hu, Xia

论文摘要

我们研究分布式推荐系统的嵌入式表位置，该系统旨在将桌子划分并放置在多个硬件设备（例如GPU）上，以平衡计算和通信成本。尽管先前的工作已经探索了基于学习的方法来放置计算图的设备，但是嵌入表放置仍然是一个具有挑战性的问题，因为1）嵌入表的操作融合，以及2）2）对具有不同数量的表和/或设备的看不见的放置任务的普遍性要求。为此，我们提出了DreamShard，这是一种嵌入餐桌放置的增强学习方法（RL）方法。 Dreamshard通过1）通过1）成本网络来实现操作融合和概括性的推理，以直接预测融合操作的成本，2）在没有实际GPU执行的情况下，有效地在估计的Markov决策过程（MDP）上进行了有效培训的策略网络，在该执行情况下，国家和较高的成本网络估计了成本网络。配备了总和和最大表示的减少，两个网络可以直接概括为具有不同数量的表和/或设备而无需进行微调的任何未见任务。广泛的实验表明，Dreamshard大大优于现有的人类专家和基于RNN的策略，其速度比大型合成表和我们的生产表的最强基线高达19％。该代码可从https://github.com/daochenzha/dreamshard获得

We study embedding table placement for distributed recommender systems, which aims to partition and place the tables on multiple hardware devices (e.g., GPUs) to balance the computation and communication costs. Although prior work has explored learning-based approaches for the device placement of computational graphs, embedding table placement remains to be a challenging problem because of 1) the operation fusion of embedding tables, and 2) the generalizability requirement on unseen placement tasks with different numbers of tables and/or devices. To this end, we present DreamShard, a reinforcement learning (RL) approach for embedding table placement. DreamShard achieves the reasoning of operation fusion and generalizability with 1) a cost network to directly predict the costs of the fused operation, and 2) a policy network that is efficiently trained on an estimated Markov decision process (MDP) without real GPU execution, where the states and the rewards are estimated with the cost network. Equipped with sum and max representation reductions, the two networks can directly generalize to any unseen tasks with different numbers of tables and/or devices without fine-tuning. Extensive experiments show that DreamShard substantially outperforms the existing human expert and RNN-based strategies with up to 19% speedup over the strongest baseline on large-scale synthetic tables and our production tables. The code is available at https://github.com/daochenzha/dreamshard

下载PDF全文

下载文献需遵守相关版权规定

论文标题