3D多对象跟踪具有可区分姿势估计

论文标题

3D多对象跟踪具有可区分姿势估计

3D Multi-Object Tracking with Differentiable Pose Estimation

论文作者

Schmauser, Dominik, Qiu, Zeju, Müller, Norman, Nießner, Matthias

论文摘要

我们提出了一种新的方法，用于从室内环境中的RGB-D序列进行连接3D多对象跟踪和重建。为此，我们在每个帧中检测并重建对象，同时预测密集的对应关系映射到归一化对象空间中。我们利用这些信件来告知图神经网络，以求解所有对象的最佳，时间一致的7-DOF姿势轨迹。我们方法的新颖性是两个方面：首先，我们提出了一种基于图的新方法，用于随着时间的流逝，以学习最佳姿势轨迹。其次，我们提出了沿时间轴的重建和姿势估计的联合公式，以稳健和几何一致的多对象跟踪。为了验证我们的方法，我们引入了一个新的合成数据集，其中包括2381个唯一的室内序列，总共有60k渲染的RGB-D图像，用于多对象跟踪，并带有移动对象和摄像机位置，该镜头和摄像机位置来自合成3D-Front数据集。我们证明，与现有最新方法相比，我们的方法将所有测试序列的MOTA分数提高了24.8％。在关于合成和现实世界序列的几种消融中，我们表明我们的基于图形的，完全端到端的方法可以显着提高跟踪性能。

We propose a novel approach for joint 3D multi-object tracking and reconstruction from RGB-D sequences in indoor environments. To this end, we detect and reconstruct objects in each frame while predicting dense correspondences mappings into a normalized object space. We leverage those correspondences to inform a graph neural network to solve for the optimal, temporally-consistent 7-DoF pose trajectories of all objects. The novelty of our method is two-fold: first, we propose a new graph-based approach for differentiable pose estimation over time to learn optimal pose trajectories; second, we present a joint formulation of reconstruction and pose estimation along the time axis for robust and geometrically consistent multi-object tracking. In order to validate our approach, we introduce a new synthetic dataset comprising 2381 unique indoor sequences with a total of 60k rendered RGB-D images for multi-object tracking with moving objects and camera positions derived from the synthetic 3D-FRONT dataset. We demonstrate that our method improves the accumulated MOTA score for all test sequences by 24.8% over existing state-of-the-art methods. In several ablations on synthetic and real-world sequences, we show that our graph-based, fully end-to-end-learnable approach yields a significant boost in tracking performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题