论文标题
视觉问题从另一个角度回答:CLEVR心理旋转测试
Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests
论文作者
论文摘要
多种类型的心理旋转测试已在心理学中广泛使用,以了解人类的视觉推理和感知。从另一个角度了解对象或视觉场景是一个具有挑战性的问题,如果必须从单个图像中执行它,那么就会变得更加困难。我们探索一个受控的设置,如果从另一个角度观察到该场景,则对场景的属性提出了问题。为此,我们创建了一个新版本的CLEVR数据集,我们称为CLEVR心理旋转测试(CLEVR-MRT)。使用CLEVR-MRT,我们检查了标准方法,显示它们如何掉落,然后探索涉及推断场景体积表示的新型神经体系结构。这些卷可以通过摄像机条件转换来操纵,以回答这个问题。我们通过严格的消融检查了不同模型变体的功效,并证明了体积表示的功效。
Different types of mental rotation tests have been used extensively in psychology to understand human visual reasoning and perception. Understanding what an object or visual scene would look like from another viewpoint is a challenging problem that is made even harder if it must be performed from a single image. We explore a controlled setting whereby questions are posed about the properties of a scene if that scene was observed from another viewpoint. To do this we have created a new version of the CLEVR dataset that we call CLEVR Mental Rotation Tests (CLEVR-MRT). Using CLEVR-MRT we examine standard methods, show how they fall short, then explore novel neural architectures that involve inferring volumetric representations of a scene. These volumes can be manipulated via camera-conditioned transformations to answer the question. We examine the efficacy of different model variants through rigorous ablations and demonstrate the efficacy of volumetric representations.