隐式和明确特征的一致性是单眼3D对象检测的一致性

论文标题

隐式和明确特征的一致性是单眼3D对象检测的一致性

Consistency of Implicit and Explicit Features Matters for Monocular 3D Object Detection

论文作者

Ye, Qian, Jiang, Ling, Zhen, Wang, Du, Yuyang

论文摘要

包括自动驾驶车辆在内的低成本自主代理主要采用单眼3D对象检测来感知周围环境。本文研究了3D中间表示方法，该方法生成了用于后续任务的中间3D特征。例如，3D功能不仅可以作为检测的输入，还可以作为需要鸟眼视图特征表示的端到端预测和/或计划。在研究中，我们发现，在生成3D表示时，先前的方法并不能保持对象在潜在空间中的隐式姿势之间的一致性，尤其是方向，以及在欧几里得空间中明确观察到的姿势，这可能会大大损害模型性能。为了解决这个问题，我们提出了一种新颖的单眼检测方法，第一个意识到姿势是为了确保它们在隐式和显式特征之间保持一致。此外，我们引入了局部射线注意机制，以在准确的3D位置上有效地将图像特征转换为体素。第三，我们提出了手工制作的高斯位置编码函数，该功能的表现优于正弦曲线编码函数，同时保留了连续的好处。结果表明，我们的方法将最新的3D中间表示方法提高了3.15％。从结果的提交时间开始，我们在KITTI排行榜上的3D和BEV检测基准的所有报告的单眼方法中排名第一。

Low-cost autonomous agents including autonomous driving vehicles chiefly adopt monocular 3D object detection to perceive surrounding environment. This paper studies 3D intermediate representation methods which generate intermediate 3D features for subsequent tasks. For example, the 3D features can be taken as input for not only detection, but also end-to-end prediction and/or planning that require a bird's-eye-view feature representation. In the study, we found that in generating 3D representation previous methods do not maintain the consistency between the objects' implicit poses in the latent space, especially orientations, and the explicitly observed poses in the Euclidean space, which can substantially hurt model performance. To tackle this problem, we present a novel monocular detection method, the first one being aware of the poses to purposefully guarantee that they are consistent between the implicit and explicit features. Additionally, we introduce a local ray attention mechanism to efficiently transform image features to voxels at accurate 3D locations. Thirdly, we propose a handcrafted Gaussian positional encoding function, which outperforms the sinusoidal encoding function while retaining the benefit of being continuous. Results show that our method improves the state-of-the-art 3D intermediate representation method by 3.15%. We are ranked 1st among all the reported monocular methods on both 3D and BEV detection benchmark on KITTI leaderboard as of th result's submission time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题