具有一般激活功能的神经网络的损失表面

论文标题

具有一般激活功能的神经网络的损失表面

The Loss Surfaces of Neural Networks with General Activation Functions

论文作者

Baskerville, Nicholas P., Keating, Jonathan P., Mezzadri, Francesco, Najnudel, Joseph

论文摘要

在过去的几年中，深度神经网络的损失表面一直是一些理论和实验性研究的主题。一系列的工作考虑了高维随机函数的局部最佳功能的复杂性，目的是告知局部优化方法在这种复杂的设置中如何执行。 Choromanska等人（2015年）的先前工作在网络及其数据上的一些非常强大的假设下，在深度多层感知器网络的训练损失表面与球形多型玻璃模型之间建立了直接联系。在这项工作中，我们通过消除对Relu激活函数的不良限制来测试这种方法的有效性。在此过程中，我们使用随机矩阵理论中的超对称方法来绘制一条新的路径，该方法在其他情况下可能有用。在这种情况下，我们的结果对自旋玻璃模型的优势和劣势揭示了新的启示。

The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established a direct link between the training loss surfaces of deep multi-layer perceptron networks and spherical multi-spin glass models under some very strong assumptions on the network and its data. In this work, we test the validity of this approach by removing the undesirable restriction to ReLU activation functions. In doing so, we chart a new path through the spin glass complexity calculations using supersymmetric methods in Random Matrix Theory which may prove useful in other contexts. Our results shed new light on both the strengths and the weaknesses of spin glass models in this context.

下载PDF全文

下载文献需遵守相关版权规定

论文标题