论文标题
损失景观有奇异性
There is a Singularity in the Loss Landscape
论文作者
论文摘要
尽管神经网络广泛采用,但他们的训练动力仍然知之甚少。我们通过实验表明,随着数据集的大小增加,损失梯度的大小无限的点形式。梯度下降迅速使网络接近参数空间中的奇点,并在其附近进行进一步的训练。这种奇异性解释了最近在神经网络损失函数的Hessian中观察到的各种现象,例如在稳定边缘的训练和顶级子空间中梯度的浓度。一旦网络接近奇点,即使它构成了大多数梯度,顶部的子空间也对学习无济于事。
Despite the widespread adoption of neural networks, their training dynamics remain poorly understood. We show experimentally that as the size of the dataset increases, a point forms where the magnitude of the gradient of the loss becomes unbounded. Gradient descent rapidly brings the network close to this singularity in parameter space, and further training takes place near it. This singularity explains a variety of phenomena recently observed in the Hessian of neural network loss functions, such as training on the edge of stability and the concentration of the gradient in a top subspace. Once the network approaches the singularity, the top subspace contributes little to learning, even though it constitutes the majority of the gradient.