论文标题
探索深3D空间编码,以了解大规模3D场景
Exploring Deep 3D Spatial Encodings for Large-Scale 3D Scene Understanding
论文作者
论文摘要
原始3D点云的语义分割是3D场景分析中的重要组成部分,但它提出了几个挑战,这主要是由于3D点云的非欧几里得性质。尽管已经提出了几种基于深度学习的方法来解决这项任务,但几乎所有的方法都强调使用传统卷积神经网络(CNN)的潜在(全球)特征表示,导致空间信息严重丢失,因此未能模拟潜在的3D对象的几何形状,从而在远程感知3D场景中起着重要作用。在这封信中,我们提出了一种替代方法来克服基于CNN的方法的局限性,通过将原始3D点云的空间特征编码为无方向的对称图模型。然后将这些编码与从传统CNN提取到局部图卷积运算符中提取的高维特征向量相结合,该局部图卷积操作员输出所需的3D分割图。我们已经在两个标准基准数据集(包括一个户外空中遥感数据集和一个室内合成数据集)上进行了实验。提出的方法可以通过改进的训练时间和模型稳定性来实现最先进的准确性,从而表明进一步研究的强大潜力以对3D场景理解的通用最新方法进行。
Semantic segmentation of raw 3D point clouds is an essential component in 3D scene analysis, but it poses several challenges, primarily due to the non-Euclidean nature of 3D point clouds. Although, several deep learning based approaches have been proposed to address this task, but almost all of them emphasized on using the latent (global) feature representations from traditional convolutional neural networks (CNN), resulting in severe loss of spatial information, thus failing to model the geometry of the underlying 3D objects, that plays an important role in remote sensing 3D scenes. In this letter, we have proposed an alternative approach to overcome the limitations of CNN based approaches by encoding the spatial features of raw 3D point clouds into undirected symmetrical graph models. These encodings are then combined with a high-dimensional feature vector extracted from a traditional CNN into a localized graph convolution operator that outputs the required 3D segmentation map. We have performed experiments on two standard benchmark datasets (including an outdoor aerial remote sensing dataset and an indoor synthetic dataset). The proposed method achieves on par state-of-the-art accuracy with improved training time and model stability thus indicating strong potential for further research towards a generalized state-of-the-art method for 3D scene understanding.