论文标题
3D多对象跟踪具有可区分姿势估计
3D Multi-Object Tracking with Differentiable Pose Estimation
论文作者
论文摘要
我们提出了一种新的方法,用于从室内环境中的RGB-D序列进行连接3D多对象跟踪和重建。为此,我们在每个帧中检测并重建对象,同时预测密集的对应关系映射到归一化对象空间中。我们利用这些信件来告知图神经网络,以求解所有对象的最佳,时间一致的7-DOF姿势轨迹。我们方法的新颖性是两个方面:首先,我们提出了一种基于图的新方法,用于随着时间的流逝,以学习最佳姿势轨迹。其次,我们提出了沿时间轴的重建和姿势估计的联合公式,以稳健和几何一致的多对象跟踪。为了验证我们的方法,我们引入了一个新的合成数据集,其中包括2381个唯一的室内序列,总共有60k渲染的RGB-D图像,用于多对象跟踪,并带有移动对象和摄像机位置,该镜头和摄像机位置来自合成3D-Front数据集。我们证明,与现有最新方法相比,我们的方法将所有测试序列的MOTA分数提高了24.8%。在关于合成和现实世界序列的几种消融中,我们表明我们的基于图形的,完全端到端的方法可以显着提高跟踪性能。
We propose a novel approach for joint 3D multi-object tracking and reconstruction from RGB-D sequences in indoor environments. To this end, we detect and reconstruct objects in each frame while predicting dense correspondences mappings into a normalized object space. We leverage those correspondences to inform a graph neural network to solve for the optimal, temporally-consistent 7-DoF pose trajectories of all objects. The novelty of our method is two-fold: first, we propose a new graph-based approach for differentiable pose estimation over time to learn optimal pose trajectories; second, we present a joint formulation of reconstruction and pose estimation along the time axis for robust and geometrically consistent multi-object tracking. In order to validate our approach, we introduce a new synthetic dataset comprising 2381 unique indoor sequences with a total of 60k rendered RGB-D images for multi-object tracking with moving objects and camera positions derived from the synthetic 3D-FRONT dataset. We demonstrate that our method improves the accumulated MOTA score for all test sequences by 24.8% over existing state-of-the-art methods. In several ablations on synthetic and real-world sequences, we show that our graph-based, fully end-to-end-learnable approach yields a significant boost in tracking performance.