使用基于传感器的域随机化来弥合姿势估计网络的现实差距

论文标题

使用基于传感器的域随机化来弥合姿势估计网络的现实差距

Bridging the Reality Gap for Pose Estimation Networks using Sensor-Based Domain Randomization

论文作者

Hagelskjaer, Frederik, Buch, Anders Glent

论文摘要

自从引入现代深度学习方法以进行对象姿势估计以来，测试准确性和效率已大大提高。但是，对于培训，需要大量带注释的培训数据才能表现良好。虽然使用合成训练数据阻止了手动注释的需求，但目前在实际数据和合成数据训练的方法之间存在较大的性能差距。本文介绍了一种新方法，该方法弥合了这一差距。大多数对合成数据训练的方法都使用2D图像，因为2D中的域随机化更加开发。为了获得精确的姿势，其中许多方法使用3D数据执行最终完善。我们的方法将3D数据集成到网络中，以提高姿势估计的准确性。为了允许在3D中进行域随机化，已经开发了基于传感器的数据增强。此外，我们介绍了稀疏的功能，该功能在点云传播过程中使用更宽的搜索空间，以避免依靠特定功能而不增加运行时。在三个大姿势估计基准上进行的实验表明，所提出的方法优于先前对合成数据训练的方法，并取得与对实际数据训练的现有方法相当的结果。

Since the introduction of modern deep learning methods for object pose estimation, test accuracy and efficiency has increased significantly. For training, however, large amounts of annotated training data are required for good performance. While the use of synthetic training data prevents the need for manual annotation, there is currently a large performance gap between methods trained on real and synthetic data. This paper introduces a new method, which bridges this gap. Most methods trained on synthetic data use 2D images, as domain randomization in 2D is more developed. To obtain precise poses, many of these methods perform a final refinement using 3D data. Our method integrates the 3D data into the network to increase the accuracy of the pose estimation. To allow for domain randomization in 3D, a sensor-based data augmentation has been developed. Additionally, we introduce the SparseEdge feature, which uses a wider search space during point cloud propagation to avoid relying on specific features without increasing run-time. Experiments on three large pose estimation benchmarks show that the presented method outperforms previous methods trained on synthetic data and achieves comparable results to existing methods trained on real data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题