人口梯度改善了对象分类中数据集和体系结构的性能

论文标题

人口梯度改善了对象分类中数据集和体系结构的性能

Population Gradients improve performance across data-sets and architectures in object classification

论文作者

Sakai, Yurika, Kormilitzin, Andrey, Liu, Qiang, Nevado-Holgado, Alejo

论文摘要

最成功的方法，例如Relu传递功能，批准归一化，Xavier初始化，辍学率，学习率衰减或动态优化器，已成为该领域的标准，尤其是由于它们显着提高神经网络（NNS）的性能的能力，并且几乎在所有情况下。在这里，我们提出了一种在训练NN时计算梯度的新方法，并表明它可以显着改善跨体系结构，数据集，超参数值，训练长度和模型尺寸的最终性能，包括与其他常见的绩效改善方法（如上所述）。除了在我们已经测试过的广泛阵列情况下，其提供的性能（例如F1）的提高比我们比较了与其他所有广泛的性能改善方法一样高或更高。我们称我们的方法人口梯度（PG），它包括使用NNS人群来计算梯度的非本地估计，该梯度更接近理论确切的梯度（即仅使用无限大数据设置的误差函数）比经验梯度（即，该梯度都可以获得无限的大数据集）（即，此梯度都可以通过实际有限的数据组来获得。

The most successful methods such as ReLU transfer functions, batch normalization, Xavier initialization, dropout, learning rate decay, or dynamic optimizers, have become standards in the field due, particularly, to their ability to increase the performance of Neural Networks (NNs) significantly and in almost all situations. Here we present a new method to calculate the gradients while training NNs, and show that it significantly improves final performance across architectures, data-sets, hyper-parameter values, training length, and model sizes, including when it is being combined with other common performance-improving methods (such as the ones mentioned above). Besides being effective in the wide array situations that we have tested, the increase in performance (e.g. F1) it provides is as high or higher than this one of all the other widespread performance-improving methods that we have compared against. We call our method Population Gradients (PG), and it consists on using a population of NNs to calculate a non-local estimation of the gradient, which is closer to the theoretical exact gradient (i.e. this one obtainable only with an infinitely big data-set) of the error function than the empirical gradient (i.e. this one obtained with the real finite data-set).

下载PDF全文

下载文献需遵守相关版权规定

论文标题