表征当地强烈凸高风险的多余风险

论文标题

表征当地强烈凸高风险的多余风险

Characterization of Excess Risk for Locally Strongly Convex Population Risk

论文作者

Yi, Mingyang, Wang, Ruoyu, Ma, Zhi-Ming

论文摘要

我们为通过适当的迭代算法训练的模型的预期多余风险建立了上限，该算法近似于局部最小值。与基于强烈的全球强烈凸性或全球增长条件（例如，PL-nequality）所构建的结果不同，我们只要求人口风险在其本地最小值周围强烈凸起。具体而言，我们在凸问题下的约束是$ \ tilde {\ co}（1/n）$。对于$ d $型号参数的非凸问题，$ d/n $小于独立于$ n $的阈值，如果经验风险没有较高的局部最小值，则可以维持$ \ tilde {\ co}（1/n）$的订单。此外，没有这样的假设的$ \ tilde {\ co}（1/\ sqrt {n}）$没有这样的假设而变成$ \ tilde {\ co}（1/\ sqrt {n}）$。我们的结果是通过算法稳定性和经验风险景观表征得出的。与现有的基于算法的稳定性结果相比，我们的界限是尺寸不敏感的，并且对算法的实现，学习率和迭代次数无限制。我们的界限强调了，在本地强烈凸出的人口风险中，通过任何适当的迭代算法训练的模型即使对于非凸面问题也可以很好地概括，而$ d $也很大。

We establish upper bounds for the expected excess risk of models trained by proper iterative algorithms which approximate the local minima. Unlike the results built upon the strong globally strongly convexity or global growth conditions e.g., PL-inequality, we only require the population risk to be \emph{locally} strongly convex around its local minima. Concretely, our bound under convex problems is of order $\tilde{\cO}(1/n)$. For non-convex problems with $d$ model parameters such that $d/n$ is smaller than a threshold independent of $n$, the order of $\tilde{\cO}(1/n)$ can be maintained if the empirical risk has no spurious local minima with high probability. Moreover, the bound for non-convex problem becomes $\tilde{\cO}(1/\sqrt{n})$ without such assumption. Our results are derived via algorithmic stability and characterization of the empirical risk's landscape. Compared with the existing algorithmic stability based results, our bounds are dimensional insensitive and without restrictions on the algorithm's implementation, learning rate, and the number of iterations. Our bounds underscore that with locally strongly convex population risk, the models trained by any proper iterative algorithm can generalize well, even for non-convex problems, and $d$ is large.

下载PDF全文

下载文献需遵守相关版权规定

论文标题