零击对象目标视觉导航

论文标题

零击对象目标视觉导航

Zero-shot object goal visual navigation

论文作者

Zhao, Qianfan, Zhang, Lu, He, Bin, Qiao, Hong, Liu, Zhiyong

论文摘要

对象目标视觉导航是一项具有挑战性的任务，旨在指导机器人根据其视觉观察找到目标对象，并且该目标仅限于训练阶段预定的类别。但是，在实际家庭中，机器人可能需要处理许多目标类别，并且在培训阶段，所有这些类别都很难包含。为了应对这一挑战，我们研究了零射门对象目标视觉导航任务，该任务旨在指导机器人找到属于新颖类的目标，而无需任何培训样本。为此，我们还提出了一个新颖的零摄像对象导航框架，称为语义相似性网络（SSNET）。我们的框架使用检测结果以及语义单词嵌入之间的余弦相似性作为输入。这种类型的输入数据与类别的相关性较弱，因此我们的框架能够将策略推广到新颖的类别。 AI2该平台上的广泛实验表明，我们的模型在零摄像对象导航任务中优于基线模型，这证明了我们的模型的概括能力。我们的代码可在以下网址提供：https：//github.com/pioneer-innovation/zero-sero-shot-object-navigation。

Object goal visual navigation is a challenging task that aims to guide a robot to find the target object based on its visual observation, and the target is limited to the classes pre-defined in the training stage. However, in real households, there may exist numerous target classes that the robot needs to deal with, and it is hard for all of these classes to be contained in the training stage. To address this challenge, we study the zero-shot object goal visual navigation task, which aims at guiding robots to find targets belonging to novel classes without any training samples. To this end, we also propose a novel zero-shot object navigation framework called semantic similarity network (SSNet). Our framework use the detection results and the cosine similarity between semantic word embeddings as input. Such type of input data has a weak correlation with classes and thus our framework has the ability to generalize the policy to novel classes. Extensive experiments on the AI2-THOR platform show that our model outperforms the baseline models in the zero-shot object navigation task, which proves the generalization ability of our model. Our code is available at: https://github.com/pioneer-innovation/Zero-Shot-Object-Navigation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题