改善语义图像合成的增强和评估方案

论文标题

改善语义图像合成的增强和评估方案

Improving Augmentation and Evaluation Schemes for Semantic Image Synthesis

论文作者

Katiyar, Prateek, Khoreva, Anna

论文摘要

尽管数据增加是一种事实上的技术来提高深层神经网络的性能，但很少关注开发生成对抗网络（GAN）的增强策略。为此，我们介绍了专门为基于GAN的语义图像合成模型设计的新型增强方案。我们建议在用作发电机的输入的语义标签图中随机翘曲对象形状。扭曲的标签图和图像之间的局部形状差异使Gan能够更好地了解场景的结构和几何细节，从而提高生成的图像的质量。在基准针对其香草对应物的增强GAN模型基准测试时，我们发现，在先前的语义图像合成研究中报告的定量指标对特定语义类别有很大偏见，因为它们是通过外部训练的预训练细分网络得出的。因此，我们建议通过分别分析给定分割网络上有偏见和无偏见的类中生成的图像的性能来改善已建立的语义图像综合评估方案。最后，我们在两个类别的分裂上，使用三个不同数据集中的最新语义图像合成模型，通过我们的增强方案（在两个类别的分裂）上显示出强大的定量和定性改进。在可可固定，ADE20K和CityScapes数据集中，增强模型的平均表现优于其香草对应物，比大约3 miou和〜10 fid点。

Despite data augmentation being a de facto technique for boosting the performance of deep neural networks, little attention has been paid to developing augmentation strategies for generative adversarial networks (GANs). To this end, we introduce a novel augmentation scheme designed specifically for GAN-based semantic image synthesis models. We propose to randomly warp object shapes in the semantic label maps used as an input to the generator. The local shape discrepancies between the warped and non-warped label maps and images enable the GAN to learn better the structural and geometric details of the scene and thus to improve the quality of generated images. While benchmarking the augmented GAN models against their vanilla counterparts, we discover that the quantification metrics reported in the previous semantic image synthesis studies are strongly biased towards specific semantic classes as they are derived via an external pre-trained segmentation network. We therefore propose to improve the established semantic image synthesis evaluation scheme by analyzing separately the performance of generated images on the biased and unbiased classes for the given segmentation network. Finally, we show strong quantitative and qualitative improvements obtained with our augmentation scheme, on both class splits, using state-of-the-art semantic image synthesis models across three different datasets. On average across COCO-Stuff, ADE20K and Cityscapes datasets, the augmented models outperform their vanilla counterparts by ~3 mIoU and ~10 FID points.

下载PDF全文

下载文献需遵守相关版权规定

论文标题