麦克风：上下文增强域适应的蒙版图像一致性

论文标题

麦克风：上下文增强域适应的蒙版图像一致性

MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation

论文作者

Hoyer, Lukas, Dai, Dengxin, Wang, Haoran, Van Gool, Luc

论文摘要

在无监督的域适应性（UDA）中，在源数据（例如合成）上训练的模型适用于目标数据（例如，现实世界），而无需访问目标注释。大多数以前的UDA方法都与在目标域上具有相似视觉外观的课程斗争，因为没有地面真相可用于学习轻微的外观差异。为了解决这个问题，我们提出了一个蒙版的图像一致性（MIC）模块，以通过学习目标域的空间上下文关系作为可靠的视觉识别的其他线索来增强UDA。 MIC在扣留随机贴片的掩盖目标图像的预测与基于指数移动平均教师基于完整图像生成的伪标记之间实施了一致性。为了最大程度地减少一致性损失，网络必须学会从其上下文中推断出蒙版区域的预测。由于其简单而通用的概念，可以将MIC集成到不同的视觉识别任务（例如图像分类，语义分割和对象检测）的各种UDA方法中。 MIC显着改善了在综合到现实的不同识别任务中的最新性能，每天到夜间，以及清晰的逆转UDA。例如，MIC分别在GTA-TO-CITYSCAPES和VISDA-2017上实现了75.9 MIOU的前所未有的UDA性能，这对应于与先前的ART状态相比的+2.1和+3.0％。该实现可在https://github.com/lhoyer/mic上获得。

In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. MIC enforces the consistency between predictions of masked target images, where random patches are withheld, and pseudo-labels that are generated based on the complete image by an exponential moving average teacher. To minimize the consistency loss, the network has to learn to infer the predictions of the masked regions from their context. Due to its simple and universal concept, MIC can be integrated into various UDA methods across different visual recognition tasks such as image classification, semantic segmentation, and object detection. MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8% on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an improvement of +2.1 and +3.0 percent points over the previous state of the art. The implementation is available at https://github.com/lhoyer/MIC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题