Vista：通过双重跨视图提高3D对象检测

论文标题

Vista：通过双重跨视图提高3D对象检测

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

论文作者

Deng, Shengheng, Liang, Zhihao, Sun, Lin, Jia, Kui

论文摘要

从LiDar Point云中检测物体在自动驾驶中具有巨大的意义。尽管取得了良好的进步，但由于LiDar Point云的稀疏性和不规则性，准确和可靠的3D检测尚未实现。在现有策略中，多视图方法通过利用鸟类视图（BEV）和范围视图（RV）的更全面信息来表现出巨大的希望。这些多视图方法要么通过融合功能来完善从单视图预测的建议，要么在不考虑全局空间上下文的情况下融合功能。因此，他们的性能受到限制。在本文中，我们建议通过双重跨视图空间注意（VISTA）在全球空间上下文中自适应地融合多视图功能。提出的远景是一种新颖的插件融合模块，其中在标准注意模块中广泛采用的多层感知器被卷积替代。得益于学到的注意力机制，Vista可以产生高质量的融合特征，以预测建议。我们将Vista中的分类和回归任务解除，并应用了注意力方差的其他约束，使注意模块能够专注于特定目标而不是通用点。我们对Nuscenes和Waymo的基准进行了彻底的实验。结果证实了我们设计的功效。在提交时，我们的方法在整体地图上达到63.0％，NDS在Nuscenes基准测试中获得69.8％，在安全性类别（如骑自行车者）中，所有已发布的方法的表现都超过了所有已发表的方法。 Pytorch中的源代码可从https://github.com/gorilla-lab-scut/vista获得

Detecting objects from LiDAR point clouds is of tremendous significance in autonomous driving. In spite of good progress, accurate and reliable 3D detection is yet to be achieved due to the sparsity and irregularity of LiDAR point clouds. Among existing strategies, multi-view methods have shown great promise by leveraging the more comprehensive information from both bird's eye view (BEV) and range view (RV). These multi-view methods either refine the proposals predicted from single view via fused features, or fuse the features without considering the global spatial context; their performance is limited consequently. In this paper, we propose to adaptively fuse multi-view features in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA). The proposed VISTA is a novel plug-and-play fusion module, wherein the multi-layer perceptron widely adopted in standard attention modules is replaced with a convolutional one. Thanks to the learned attention mechanism, VISTA can produce fused features of high quality for prediction of proposals. We decouple the classification and regression tasks in VISTA, and an additional constraint of attention variance is applied that enables the attention module to focus on specific targets instead of generic points. We conduct thorough experiments on the benchmarks of nuScenes and Waymo; results confirm the efficacy of our designs. At the time of submission, our method achieves 63.0% in overall mAP and 69.8% in NDS on the nuScenes benchmark, outperforming all published methods by up to 24% in safety-crucial categories such as cyclist. The source code in PyTorch is available at https://github.com/Gorilla-Lab-SCUT/VISTA

下载PDF全文

下载文献需遵守相关版权规定

论文标题