两层relu网络中伪造的minima的歼灭

论文标题

两层relu网络中伪造的minima的歼灭

Annihilation of Spurious Minima in Two-Layer ReLU Networks

论文作者

Arjevani, Yossi, Field, Michael

论文摘要

我们研究了与拟合两层Relu神经网络相关的优化问题，相对于平方损耗，该标签是由目标网络生成的。使用丰富的对称结构来开发一组新型的工具，用于研究过度参数歼灭虚假最小的机制。在不同的最小值处获得了损失和黑森频谱的尖锐分析估计值，并且证明添加神经元可以将对称的伪造最小值变成鞍座。较小的对称性需要更多的神经元。使用Cauchy的交织定理，我们证明了由损耗函数的对称结构引起的某些子空间中的下降方向的存在。这种分析方法使用了从代数几何，表示理论和对称性破坏的领域新技术，并严格确认过度参数化在使基于梯度的方法可访问的相关损失景观方面的有效性。对于固定数量的神经元和输入，在目标破坏对称性的扰动下，光谱结果仍然是正确的。

We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss, where labels are generated by a target network. Use is made of the rich symmetry structure to develop a novel set of tools for studying the mechanism by which over-parameterization annihilates spurious minima. Sharp analytic estimates are obtained for the loss and the Hessian spectrum at different minima, and it is proved that adding neurons can turn symmetric spurious minima into saddles; minima of lesser symmetry require more neurons. Using Cauchy's interlacing theorem, we prove the existence of descent directions in certain subspaces arising from the symmetry structure of the loss function. This analytic approach uses techniques, new to the field, from algebraic geometry, representation theory and symmetry breaking, and confirms rigorously the effectiveness of over-parameterization in making the associated loss landscape accessible to gradient-based methods. For a fixed number of neurons and inputs, the spectral results remain true under symmetry breaking perturbation of the target.

下载PDF全文

下载文献需遵守相关版权规定

论文标题