基于骨架的动作识别的反馈图卷积网络

论文标题

基于骨架的动作识别的反馈图卷积网络

Feedback Graph Convolutional Network for Skeleton-based Action Recognition

论文作者

Yang, Hao, Yan, Dan, Zhang, Li, Li, Dong, Sun, YunDa, You, ShaoDi, Maybank, Stephen J.

论文摘要

基于骨架的动作识别在计算机视觉中引起了极大的关注，因为骨架数据比其他方式更适合动态情况和复杂背景。最近，许多研究人员使用图形卷积网络（GCN）通过端到端优化对骨架序列的时空特征进行建模。但是，传统的GCN是馈电网络，低级层无法访问高级层中的语义信息。在本文中，我们提出了一个新颖的网络，名为“反馈图卷积网络”（FGCN）。这是将反馈机制引入GCN和行动识别的第一项工作。与常规GCN相比，FGCN具有以下优点：（1）多阶段的时间抽样策略旨在提取空间 - 周期性特征，以在粗到细节的渐进过程中进行动作识别；（2）提出了基于密集的连接的反馈图卷积块（FGCB），以将反馈连接引入GCN。它将高级语义特征传输到低级层，并按阶段流动时间信息，以逐步建模全局时空特征以进行动作识别；（3）FGCN模型提供了早期预测。在早期阶段，该模型会收到有关动作的部分信息。自然，其预测相对粗糙。粗略的预测被视为在指导以后阶段的特征学习以进行准确的预测。在数据集上进行的广泛实验，NTU-RGB+D，NTU-RGB+D120和Northwestern-UCLA，表明所提出的FGCN对行动识别有效。它可以在三个数据集上实现最新性能。

Skeleton-based action recognition has attracted considerable attention in computer vision since skeleton data is more robust to the dynamic circumstance and complicated background than other modalities. Recently, many researchers have used the Graph Convolutional Network (GCN) to model spatial-temporal features of skeleton sequences by an end-to-end optimization. However, conventional GCNs are feedforward networks which are impossible for low-level layers to access semantic information in the high-level layers. In this paper, we propose a novel network, named Feedback Graph Convolutional Network (FGCN). This is the first work that introduces the feedback mechanism into GCNs and action recognition. Compared with conventional GCNs, FGCN has the following advantages: (1) a multi-stage temporal sampling strategy is designed to extract spatial-temporal features for action recognition in a coarse-to-fine progressive process; (2) A dense connections based Feedback Graph Convolutional Block (FGCB) is proposed to introduce feedback connections into the GCNs. It transmits the high-level semantic features to the low-level layers and flows temporal information stage by stage to progressively model global spatial-temporal features for action recognition; (3) The FGCN model provides early predictions. In the early stages, the model receives partial information about actions. Naturally, its predictions are relatively coarse. The coarse predictions are treated as the prior to guide the feature learning of later stages for a accurate prediction. Extensive experiments on the datasets, NTU-RGB+D, NTU-RGB+D120 and Northwestern-UCLA, demonstrate that the proposed FGCN is effective for action recognition. It achieves the state-of-the-art performance on the three datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题