论文标题
与弱凸和多凸代替代物的随机正规化大型化最小化
Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates
论文作者
论文摘要
随机大型化最小化(SMM)是一类随机优化算法,通过对新数据点进行采样并最大程度地减少目标函数替代功能的递归平均值来进行。对于一般非凸设置的一般构成,替代物必须是强烈的凸,并且不可用。在本文中,我们提出了SMM的扩展,其中允许替代物仅为弱凸或阻止多凸,并且平均替代物分别通过近端正则化或分别在半径减少的半径中近距离最小化。对于一般的非convex限制设置,non-i.i.d。 data samples, we show that the first-order optimality gap of the proposed algorithm decays at the rate $O((\log n)^{1+ε}/n^{1/2})$ for the empirical loss and $O((\log n)^{1+ε}/n^{1/4})$ for the expected loss, where $n$ denotes the number of data samples processed.在一些其他假设下,后者的收敛速率可以提高到$ o((\ log n)^{1+ε}/n^{1/2})$。作为推论,我们在一般非凸依赖的数据设置下获得了各种优化方法的第一个收敛速率界限:双平均预测梯度下降及其概括,近端的经验风险最小化以及在线矩阵/张量分解算法。我们还提供了结果的实验验证。
Stochastic majorization-minimization (SMM) is a class of stochastic optimization algorithms that proceed by sampling new data points and minimizing a recursive average of surrogate functions of an objective function. The surrogates are required to be strongly convex and convergence rate analysis for the general non-convex setting was not available. In this paper, we propose an extension of SMM where surrogates are allowed to be only weakly convex or block multi-convex, and the averaged surrogates are approximately minimized with proximal regularization or block-minimized within diminishing radii, respectively. For the general nonconvex constrained setting with non-i.i.d. data samples, we show that the first-order optimality gap of the proposed algorithm decays at the rate $O((\log n)^{1+ε}/n^{1/2})$ for the empirical loss and $O((\log n)^{1+ε}/n^{1/4})$ for the expected loss, where $n$ denotes the number of data samples processed. Under some additional assumption, the latter convergence rate can be improved to $O((\log n)^{1+ε}/n^{1/2})$. As a corollary, we obtain the first convergence rate bounds for various optimization methods under general nonconvex dependent data setting: Double-averaging projected gradient descent and its generalizations, proximal point empirical risk minimization, and online matrix/tensor decomposition algorithms. We also provide experimental validation of our results.