在视频中进行3D手姿势和网格估算的时间意识的自我监督学习

论文标题

在视频中进行3D手姿势和网格估算的时间意识的自我监督学习

Temporal-Aware Self-Supervised Learning for 3D Hand Pose and Mesh Estimation in Videos

论文作者

Chen, Liangjian, Lin, Shih-Yao, Xie, Yusheng, Lin, Yen-Yu, Xie, Xiaohui

论文摘要

直接从RGB图像中估算3D手姿势，但最近在带有带注释的3D姿势的深层模型中取得了稳定的进步。然而，很难注重3D姿势，因此只有几个3DHAND姿势数据集可用，所有样品都有限制。在这项研究中，我们提出了一个新的训练3D图像姿势估计模型的框架，而无需使用解释3D注释，即仅使用2D信息进行培训。我们的框架是由两个观察结果激励的：1）视频提供了更丰富的信息，以估算反对静态图像的3D姿势； 2）估计的3D姿势应保持一致，无论是以前的顺序还是反向顺序观看视频。我们利用这两种观察者开发一种称为temporal-sawawawaelawaware自我监督网络（TASSN）的自我监督学习模型。通过达到时间一致性约束，TASSN从只有2D KeyPointPosition注释的视频中学习了3DHAND姿势和网格。实验表明，我们的Modelachieves令人惊讶的是，与3D注释训练的最先进模型相当，具有3D估计的AC纯度，强调了时间抗阻在约束3D预测模型中的益处。

Estimating 3D hand pose directly from RGB imagesis challenging but has gained steady progress recently bytraining deep models with annotated 3D poses. Howeverannotating 3D poses is difficult and as such only a few 3Dhand pose datasets are available, all with limited samplesizes. In this study, we propose a new framework of training3D pose estimation models from RGB images without usingexplicit 3D annotations, i.e., trained with only 2D informa-tion. Our framework is motivated by two observations: 1)Videos provide richer information for estimating 3D posesas opposed to static images; 2) Estimated 3D poses oughtto be consistent whether the videos are viewed in the for-ward order or reverse order. We leverage these two obser-vations to develop a self-supervised learning model calledtemporal-aware self-supervised network (TASSN). By en-forcing temporal consistency constraints, TASSN learns 3Dhand poses and meshes from videos with only 2D keypointposition annotations. Experiments show that our modelachieves surprisingly good results, with 3D estimation ac-curacy on par with the state-of-the-art models trained with3D annotations, highlighting the benefit of the temporalconsistency in constraining 3D prediction models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题