通过多视图渲染进行3D点云分析的自我监督学习

论文标题

通过多视图渲染进行3D点云分析的自我监督学习

Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis

论文作者

Tran, Bach, Hua, Binh-Son, Tran, Anh Tuan, Hoai, Minh

论文摘要

最近，随着专门为3D点云设计的深神经网络的出现，在3D深度学习中取得了巨大进展。这些网络通常是从头开始训练的，或纯粹是从纯粹从点云数据中学到的预训练的模型。受图像域中深度学习的成功的启发，我们通过利用3D数据的多视图渲染来设计一种新型的预训练技术，以更好地模型初始化。我们的预训练是通过从角度投影计算出的本地像素/点级对应损失来自我监督的，基于知识蒸馏的全局图像/点云级别损失，从而有效地改善了包括PointNet，DGCNN和SR-UNET在内的流行点云网络。这些改进的模型优于各种数据集和下游任务上的现有最新方法。我们还分析了合成和实际数据对预训练的好处，并观察到合成数据的预训练也可用于高级下游任务。代码和预训练模型可在https://github.com/vinairesearch/selfsup_pcd上找到。

Recently, great progress has been made in 3D deep learning with the emergence of deep neural networks specifically designed for 3D point clouds. These networks are often trained from scratch or from pre-trained models learned purely from point cloud data. Inspired by the success of deep learning in the image domain, we devise a novel pre-training technique for better model initialization by utilizing the multi-view rendering of the 3D data. Our pre-training is self-supervised by a local pixel/point level correspondence loss computed from perspective projection and a global image/point cloud level loss based on knowledge distillation, thus effectively improving upon popular point cloud networks, including PointNet, DGCNN and SR-UNet. These improved models outperform existing state-of-the-art methods on various datasets and downstream tasks. We also analyze the benefits of synthetic and real data for pre-training, and observe that pre-training on synthetic data is also useful for high-level downstream tasks. Code and pre-trained models are available at https://github.com/VinAIResearch/selfsup_pcd.

下载PDF全文

下载文献需遵守相关版权规定

论文标题