对比性差异学习是时间逆转对抗游戏

论文标题

对比性差异学习是时间逆转对抗游戏

Contrastive Divergence Learning is a Time Reversal Adversarial Game

论文作者

Yair, Omer, Michaeli, Tomer

论文摘要

对比差异（CD）学习是一种将非统计统计模型拟合到数据样本的经典方法。尽管使用了广泛的使用，但该算法的收敛属性仍然不太了解。难度的主要来源是一种不合理的近似值，该近似是用于得出损失梯度的。在本文中，我们提出了CD的替代推导，该推导不需要任何近似值，并为实际上通过算法优化的目标提供了新的启示。具体而言，我们表明CD是一个对抗性学习过程，其中歧视者试图对从模型产生的Markov链分类进行分类。因此，尽管预期生成的对抗网络（GAN）十多年来，CD实际上与这些技术密切相关。我们的派生与以前的观察结果很好地解决了，这些观察结果得出结论，CD的更新步骤不能表示为任何固定目标函数的梯度。另外，作为副产品，我们的推导揭示了一种简单的校正，可以用作大都市束缚抑制的替代方法，这是当基础马尔可夫链不确定的情况下所需的（例如，在使用Langevin Dynamics具有很大步骤时）。

Contrastive divergence (CD) learning is a classical method for fitting unnormalized statistical models to data samples. Despite its wide-spread use, the convergence properties of this algorithm are still not well understood. The main source of difficulty is an unjustified approximation which has been used to derive the gradient of the loss. In this paper, we present an alternative derivation of CD that does not require any approximation and sheds new light on the objective that is actually being optimized by the algorithm. Specifically, we show that CD is an adversarial learning procedure, where a discriminator attempts to classify whether a Markov chain generated from the model has been time-reversed. Thus, although predating generative adversarial networks (GANs) by more than a decade, CD is, in fact, closely related to these techniques. Our derivation settles well with previous observations, which have concluded that CD's update steps cannot be expressed as the gradients of any fixed objective function. In addition, as a byproduct, our derivation reveals a simple correction that can be used as an alternative to Metropolis-Hastings rejection, which is required when the underlying Markov chain is inexact (e.g. when using Langevin dynamics with a large step).

下载PDF全文

下载文献需遵守相关版权规定

论文标题