3D卷积，注意行动识别

论文标题

3D卷积，注意行动识别

3D Convolutional with Attention for Action Recognition

论文作者

Shrestha, Labina, Dubey, Shikha, Olimov, Farrukh, Rafique, Muhammad Aasim, Jeon, Moongu

论文摘要

人类行动识别是计算机视觉中具有挑战性的任务之一。当前的动作识别方法使用计算昂贵的模型来学习动作的时空依赖性。分别利用RGB通道和光流的模型，使用两流融合技术的模型，以及由卷积神经网络（CNN）和Long-Short术语记忆（LSTM）网络组成的模型几乎没有这种复杂模型的示例。此外，微调这样的复杂模型在计算上也很昂贵。本文提出了一个深层神经网络体系结构，用于学习由3D卷积层，完全连接（FC）和注意力层组成的依赖项，该层层更易于实现，并在UCF-101数据集中提供了竞争性能。提出的方法首先通过3D-CNN学习动作的空间和时间特征，然后注意机制有助于模型找到对识别基本特征的关注。

Human action recognition is one of the challenging tasks in computer vision. The current action recognition methods use computationally expensive models for learning spatio-temporal dependencies of the action. Models utilizing RGB channels and optical flow separately, models using a two-stream fusion technique, and models consisting of both convolutional neural network (CNN) and long-short term memory (LSTM) network are few examples of such complex models. Moreover, fine-tuning such complex models is computationally expensive as well. This paper proposes a deep neural network architecture for learning such dependencies consisting of a 3D convolutional layer, fully connected (FC) layers, and attention layer, which is simpler to implement and gives a competitive performance on the UCF-101 dataset. The proposed method first learns spatial and temporal features of actions through 3D-CNN, and then the attention mechanism helps the model to locate attention to essential features for recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题