论文标题
用于过度参数化的非凸室的预处理梯度下降 - 全球最佳认证的蒙蒂罗分解
Preconditioned Gradient Descent for Overparameterized Nonconvex Burer--Monteiro Factorization with Global Optimality Certification
论文作者
论文摘要
我们考虑使用梯度下降来最大程度地减少$ n \ times r $ $因子矩阵$ x $ $ f(x)= ϕ(xx^{t})$,其中$ ϕ $是$ n \ times n $矩阵定义的基础平稳凸成本函数。虽然只能在合理的时间内找到只有二阶固定点$ x $,但如果$ x $又是排名不足,则其排名不足证明其在全球最佳状态。这种认证全球最优性的方式必然需要当前迭代$ x $的搜索等级$ r $,以相对于全局最小化$ x^{\ star} $的等级$ r^{\ star} $过度参数化。不幸的是,过度参数显着减慢了梯度下降的收敛性,从$ r = r = r = r^{\ star} $的线性速率到$ r> r> r> r> r> r^{\ star} $,即使$ ϕ $强烈凸出。在本文中,我们提出了一项廉价的预处理,该预处理恢复了过度参数化的情况下梯度下降到线性的收敛速率,同时也使在全局最小化器中可能不良条件$ x^{\ star} $。
We consider using gradient descent to minimize the nonconvex function $f(X)=ϕ(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $ϕ$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $ϕ$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\star}$.