论文标题
3D卷积,注意行动识别
3D Convolutional with Attention for Action Recognition
论文作者
论文摘要
人类行动识别是计算机视觉中具有挑战性的任务之一。当前的动作识别方法使用计算昂贵的模型来学习动作的时空依赖性。分别利用RGB通道和光流的模型,使用两流融合技术的模型,以及由卷积神经网络(CNN)和Long-Short术语记忆(LSTM)网络组成的模型几乎没有这种复杂模型的示例。此外,微调这样的复杂模型在计算上也很昂贵。本文提出了一个深层神经网络体系结构,用于学习由3D卷积层,完全连接(FC)和注意力层组成的依赖项,该层层更易于实现,并在UCF-101数据集中提供了竞争性能。提出的方法首先通过3D-CNN学习动作的空间和时间特征,然后注意机制有助于模型找到对识别基本特征的关注。
Human action recognition is one of the challenging tasks in computer vision. The current action recognition methods use computationally expensive models for learning spatio-temporal dependencies of the action. Models utilizing RGB channels and optical flow separately, models using a two-stream fusion technique, and models consisting of both convolutional neural network (CNN) and long-short term memory (LSTM) network are few examples of such complex models. Moreover, fine-tuning such complex models is computationally expensive as well. This paper proposes a deep neural network architecture for learning such dependencies consisting of a 3D convolutional layer, fully connected (FC) layers, and attention layer, which is simpler to implement and gives a competitive performance on the UCF-101 dataset. The proposed method first learns spatial and temporal features of actions through 3D-CNN, and then the attention mechanism helps the model to locate attention to essential features for recognition.