了解深度学习中的梯度正则化：有效的有限差异计算和隐式偏见

论文标题

了解深度学习中的梯度正则化：有效的有限差异计算和隐式偏见

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

论文作者

Karakida, Ryo, Takase, Tomoumi, Hayase, Tomohiro, Osawa, Kazuki

论文摘要

梯度正则化（GR）是一种惩罚培训期间训练损失的梯度规范的方法。尽管一些研究报告说GR可以改善概括性能，但从算法的角度来看，几乎没有关注它，即GR的算法可以有效地提高性能。在这项研究中，我们首先揭示了由梯度上升和下降步骤组成的特定有限差异计算可降低GR的计算成本。接下来，我们表明有限差分计算在概括性能的意义上也可以更好地工作。我们从理论上分析了一个可解决的模型，一个对角线性网络，并澄清GR对所谓的丰富政权和有限差异计算具有理想的隐式偏见会增强这种偏见。此外，有限差异GR与一些基于迭代上升和探索平面最小值的下降步骤的其他算法密切相关。特别是，我们揭示了洪水方法可以以隐式的方式执行有限差异GR。因此，这项工作扩大了我们对实践和理论的GR的理解。

Gradient regularization (GR) is a method that penalizes the gradient norm of the training loss during training. While some studies have reported that GR can improve generalization performance, little attention has been paid to it from the algorithmic perspective, that is, the algorithms of GR that efficiently improve the performance. In this study, we first reveal that a specific finite-difference computation, composed of both gradient ascent and descent steps, reduces the computational cost of GR. Next, we show that the finite-difference computation also works better in the sense of generalization performance. We theoretically analyze a solvable model, a diagonal linear network, and clarify that GR has a desirable implicit bias to so-called rich regime and finite-difference computation strengthens this bias. Furthermore, finite-difference GR is closely related to some other algorithms based on iterative ascent and descent steps for exploring flat minima. In particular, we reveal that the flooding method can perform finite-difference GR in an implicit way. Thus, this work broadens our understanding of GR for both practice and theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题