无监督的多模式样式的内容一代

论文标题

无监督的多模式样式的内容一代

Unsupervised multi-modal Styled Content Generation

论文作者

Sendik, Omry, Lischinski, Dani, Cohen-Or, Daniel

论文摘要

深层生成模型的出现最近使自动生成大量的图形内容，包括2D和3D。在这种情况下，生成的对抗网络（GAN）和样式控制机制（例如自适应实例归一化（ADAIN））已被证明在这种情况下特别有效，最终达到了最先进的样式架构。尽管这样的模型能够学习各种分布，但如果提供了足够大的培训集，但它们并不适合训练数据的分布表现出多模式行为的情况。在这种情况下，将潜在空间重塑或正态分布重塑到数据域中的复杂多模式分布非常具有挑战性，并且发电机可能无法很好地采样目标分布。此外，现有的无监督生成模型无法独立于其他视觉属性控制生成的样品的模式，尽管事实通常在培训数据中脱离了它们。在本文中，我们介绍了一种新颖的建筑，旨在以无监督的方式更好地建模多模式分布。在StyleGan体系结构的基础上，我们的网络以一种完全无监督的方式学习了多种模式，并使用一组学习的权重将它们结合在一起。我们证明，这种方法能够有效地将复杂分布作为多个简单的分布。我们进一步表明，ummgan有效地在模式和样式之间解开了，从而对生成的内容提供了独立的控制程度。

The emergence of deep generative models has recently enabled the automatic generation of massive amounts of graphical content, both in 2D and in 3D. Generative Adversarial Networks (GANs) and style control mechanisms, such as Adaptive Instance Normalization (AdaIN), have proved particularly effective in this context, culminating in the state-of-the-art StyleGAN architecture. While such models are able to learn diverse distributions, provided a sufficiently large training set, they are not well-suited for scenarios where the distribution of the training data exhibits a multi-modal behavior. In such cases, reshaping a uniform or normal distribution over the latent space into a complex multi-modal distribution in the data domain is challenging, and the generator might fail to sample the target distribution well. Furthermore, existing unsupervised generative models are not able to control the mode of the generated samples independently of the other visual attributes, despite the fact that they are typically disentangled in the training data. In this paper, we introduce UMMGAN, a novel architecture designed to better model multi-modal distributions, in an unsupervised fashion. Building upon the StyleGAN architecture, our network learns multiple modes, in a completely unsupervised manner, and combines them using a set of learned weights. We demonstrate that this approach is capable of effectively approximating a complex distribution as a superposition of multiple simple ones. We further show that UMMGAN effectively disentangles between modes and style, thereby providing an independent degree of control over the generated content.

下载PDF全文

下载文献需遵守相关版权规定

论文标题