在功率法光谱条件下优化的紧密收敛速率范围

论文标题

在功率法光谱条件下优化的紧密收敛速率范围

Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions

论文作者

Velikanov, Maksim, Yarotsky, Dmitry

论文摘要

对二次问题的优化性能敏感地取决于光谱的低覆盖部分。对于大的（有效的无限维度）问题，通常可以通过电力法分布来自然代表或近似频谱的这一部分，从而通过基于梯度的算法产生这些问题的迭代解决方案的幂定律收敛速率。在本文中，我们提出了一种新的光谱条件，为功率法优化轨迹问题提供了更严格的上限。我们使用这种条件来构建上下界限的完整图片，以用于广泛的优化算法 - 梯度下降，最陡峭的下降，重球和轭梯度 - 重点是学习率和动量的基础时间表。特别是，我们演示了如何以统一的方式获得最佳加速方法，其时间表和收敛上限，以获得频谱的给定形状。此外，我们还提供了紧密下限的第一个证据，以提供带有一般指数的光谱功率定律下陡峭下降和结合梯度的收敛速率。我们的实验表明，获得的收敛范围和加速策略不仅与二次优化问题有关，而且在应用于神经网络的训练时也相当准确。

Performance of optimization on quadratic problems sensitively depends on the low-lying part of the spectrum. For large (effectively infinite-dimensional) problems, this part of the spectrum can often be naturally represented or approximated by power law distributions, resulting in power law convergence rates for iterative solutions of these problems by gradient-based algorithms. In this paper, we propose a new spectral condition providing tighter upper bounds for problems with power law optimization trajectories. We use this condition to build a complete picture of upper and lower bounds for a wide range of optimization algorithms -- Gradient Descent, Steepest Descent, Heavy Ball, and Conjugate Gradients -- with an emphasis on the underlying schedules of learning rate and momentum. In particular, we demonstrate how an optimally accelerated method, its schedule, and convergence upper bound can be obtained in a unified manner for a given shape of the spectrum. Also, we provide first proofs of tight lower bounds for convergence rates of Steepest Descent and Conjugate Gradients under spectral power laws with general exponents. Our experiments show that the obtained convergence bounds and acceleration strategies are not only relevant for exactly quadratic optimization problems, but also fairly accurate when applied to the training of neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题