迈向需要概括的基准测试

论文标题

迈向需要概括的基准测试

Towards GAN Benchmarks Which Require Generalization

论文作者

Gulrajani, Ishaan, Raffel, Colin, Metz, Luke

论文摘要

对于许多评估指标，通常用作无条件形象产生的基准，琐碎的记忆训练集比被认为是最先进的模型更好。我们认为这个问题。我们阐明了评估度量标准的必要条件，以免以这种方式行事：估计函数必须需要大量的模型样本。为了寻找这样的度量，我们转向神经网络差异（NNDS），这些差异是根据经过培训以区分分布的神经网络定义的。不能通过训练集的记忆来“赢得”所得的基准，同时仍然在感知上仅从样品中进行相关和计算。我们调查了使用NND进行评估的过去工作，并根据这些想法实施了一个示例Black-Box指标。通过实验验证，我们表明它可以有效地衡量多样性，样本质量和概括。

For many evaluation metrics commonly used as benchmarks for unconditional image generation, trivially memorizing the training set attains a better score than models which are considered state-of-the-art; we consider this problematic. We clarify a necessary condition for an evaluation metric not to behave this way: estimating the function must require a large sample from the model. In search of such a metric, we turn to neural network divergences (NNDs), which are defined in terms of a neural network trained to distinguish between distributions. The resulting benchmarks cannot be "won" by training set memorization, while still being perceptually correlated and computable only from samples. We survey past work on using NNDs for evaluation and implement an example black-box metric based on these ideas. Through experimental validation we show that it can effectively measure diversity, sample quality, and generalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题