SIM2REAL用于自我监督的单眼深度和分割

论文标题

SIM2REAL用于自我监督的单眼深度和分割

Sim2Real for Self-Supervised Monocular Depth and Segmentation

论文作者

Raghavan, Nithin, Chakravarty, Punarjay, Shrivastava, Shubham

论文摘要

自动驾驶汽车感知任务的基于图像的学习方法需要大量标记，真实数据，以便在不拟合的情况下正确训练，这通常是昂贵的。尽管利用模拟数据的功能可以有助于减轻这些成本，但在模拟域中训练的网络通常无法在应用于真实域中的图像时充分执行。域适应性的最新进展表明，共享的潜在空间假设可以帮助弥合仿真和真实域之间的差距，从而使网络的预测能力从模拟域转移到真实域。我们证明，具有共享潜在空间的基于双VAE的体系结构和辅助解码器能够桥接SIM2REAL间隙，而无需在真实域中进行任何配对的地面数据。仅使用模拟域中的配对，基真实数据，该体系结构具有生成感知任务（例如深度和分割图）的潜力。我们将此方法与以监督方式训练的网络进行比较，以表明这些结果的优点。

Image-based learning methods for autonomous vehicle perception tasks require large quantities of labelled, real data in order to properly train without overfitting, which can often be incredibly costly. While leveraging the power of simulated data can potentially aid in mitigating these costs, networks trained in the simulation domain usually fail to perform adequately when applied to images in the real domain. Recent advances in domain adaptation have indicated that a shared latent space assumption can help to bridge the gap between the simulation and real domains, allowing the transference of the predictive capabilities of a network from the simulation domain to the real domain. We demonstrate that a twin VAE-based architecture with a shared latent space and auxiliary decoders is able to bridge the sim2real gap without requiring any paired, ground-truth data in the real domain. Using only paired, ground-truth data in the simulation domain, this architecture has the potential to generate perception tasks such as depth and segmentation maps. We compare this method to networks trained in a supervised manner to indicate the merit of these results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题