论文标题

蓝绿色:WAN交通工程的学习加速优化

Teal: Learning-Accelerated Optimization of WAN Traffic Engineering

论文作者

Xu, Zhiying, Yan, Francis Y., Singh, Rachee, Chiu, Justin T., Rush, Alexander M., Yu, Minlan

论文摘要

全球云广阔区域网络(WAN)的快速扩展对商业优化引擎提出了一个挑战,以便在大规模上有效地解决网络交通工程(TE)问题。现有的加速策略将TE优化分解为并发子问题,但由于运行时间和分配绩效之间的固有权衡而实现了有限的并行性。 我们提出了基于学习的TE算法的Teal,它利用GPU的并行处理能力加速了TE控制。首先,蓝绿色设计以流动为中心的图形神经网络(GNN)来捕获WAN连接性和网络流,学习流量特征作为下游分配的输入。其次,为了减少问题量表并使学习可进行,蓝绿色采用多代理增强学习(RL)算法来独立分配每个交通需求,同时优化中心目标。最后,蓝绿色的微调分配(乘数的交替方向方法),这是一种高度可行的优化算法,用于减少诸如过度利用的链接之类的约束违规行为。 我们使用微软WAN的交通矩阵评估蓝绿色。在具有> 1,700个节点的大型WAN拓扑结构上,蓝绿色在运行几个数量级的速度比生产优化引擎快的速度时产生近乎最佳的流量分配。与其他TE加速度方案相比,Teal满足了197--625x加速的交通需求和收率增长6--32%。

The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for commercial optimization engines to efficiently solve network traffic engineering (TE) problems at scale. Existing acceleration strategies decompose TE optimization into concurrent subproblems but realize limited parallelism due to an inherent tradeoff between run time and allocation performance. We present Teal, a learning-based TE algorithm that leverages the parallel processing power of GPUs to accelerate TE control. First, Teal designs a flow-centric graph neural network (GNN) to capture WAN connectivity and network flows, learning flow features as inputs to downstream allocation. Second, to reduce the problem scale and make learning tractable, Teal employs a multi-agent reinforcement learning (RL) algorithm to independently allocate each traffic demand while optimizing a central TE objective. Finally, Teal fine-tunes allocations with ADMM (Alternating Direction Method of Multipliers), a highly parallelizable optimization algorithm for reducing constraint violations such as overutilized links. We evaluate Teal using traffic matrices from Microsoft's WAN. On a large WAN topology with >1,700 nodes, Teal generates near-optimal flow allocations while running several orders of magnitude faster than the production optimization engine. Compared with other TE acceleration schemes, Teal satisfies 6--32% more traffic demand and yields 197--625x speedups.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源