论文标题

多模式动作识别的模态混合器

Modality Mixer for Multi-modal Action Recognition

论文作者

Lee, Sumin, Woo, Sangmin, Park, Yeonju, Nugroho, Muhammad Adi, Kim, Changick

论文摘要

在多模式的行动识别中,重要的是,不仅要考虑不同方式的互补性,而且考虑全球行动内容的互补性质。在本文中,我们提出了一个名为“模态混音器(M-Mixer)网络”的新颖网络,以利用跨模态和动作的时间上下文的互补信息进行多模式动作识别。我们还引入了一个简单而有效的复发单元,称为多模式上下文化单元(MCU),该单元(MCU)是M-Mixer的核心组成部分。我们的MCU在时间上编码具有其他模态的动作内容特征(例如Depth,ir)的动作内容特征的一个序列(例如RGB)。该过程鼓励M-Mixer利用全球行动内容,并补充其他方式的互补信息。结果,我们提出的方法在NTU RGB+D 60,NTU RGB+D 120和NW-UCLA数据集上优于最先进的方法。此外,我们通过进行全面的消融研究来证明M混合物的有效性。

In multi-modal action recognition, it is important to consider not only the complementary nature of different modalities but also global action content. In this paper, we propose a novel network, named Modality Mixer (M-Mixer) network, to leverage complementary information across modalities and temporal context of an action for multi-modal action recognition. We also introduce a simple yet effective recurrent unit, called Multi-modal Contextualization Unit (MCU), which is a core component of M-Mixer. Our MCU temporally encodes a sequence of one modality (e.g., RGB) with action content features of other modalities (e.g., depth, IR). This process encourages M-Mixer to exploit global action content and also to supplement complementary information of other modalities. As a result, our proposed method outperforms state-of-the-art methods on NTU RGB+D 60, NTU RGB+D 120, and NW-UCLA datasets. Moreover, we demonstrate the effectiveness of M-Mixer by conducting comprehensive ablation studies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源