在动态场景中，用于自我监督深度学习的注意分离和聚集网络

论文标题

在动态场景中，用于自我监督深度学习的注意分离和聚集网络

Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes

论文作者

Gao, Feng, Yu, Jincheng, Shen, Hao, Wang, Yu, Yang, Huazhong

论文摘要

通过自我抑制，来自未标记视频的学习深度和自我动作可以提高3D感知和基于视觉机器人的本地化的鲁棒性和准确性。但是，由自我动作计算的刚性投影不能表示所有场景点，例如移动对象的点，导致这些区域的错误指导。为了解决这个问题，我们提出了一个注意力分离和聚集网络（ASANET），该网络可以学会通过注意机制来区分和提取场景的静态和动态特征。我们进一步提出了一个带有Asanet作为编码器的新型运动网，然后是两个单独的解码器，以估计相机的自我运动和场景的动态运动场。然后，我们引入了一种自动选择方法，以自动检测动态感知学习的移动对象。经验实验表明，我们的方法可以在KITTI基准测试上实现最先进的性能。

Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots. However, the rigid projection computed by ego-motion cannot represent all scene points, such as points on moving objects, leading to false guidance in these regions. To address this problem, we propose an Attentional Separation-and-Aggregation Network (ASANet), which can learn to distinguish and extract the scene's static and dynamic characteristics via the attention mechanism. We further propose a novel MotionNet with an ASANet as the encoder, followed by two separate decoders, to estimate the camera's ego-motion and the scene's dynamic motion field. Then, we introduce an auto-selecting approach to detect the moving objects for dynamic-aware learning automatically. Empirical experiments demonstrate that our method can achieve the state-of-the-art performance on the KITTI benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题