自适应3D人姿势估计的非本地潜在关系蒸馏

论文标题

自适应3D人姿势估计的非本地潜在关系蒸馏

Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation

论文作者

Kundu, Jogendra Nath, Seth, Siddharth, Jamkhandi, Anirudh, YM, Pradyumna, Jampani, Varun, Chakraborty, Anirban, Babu, R. Venkatesh

论文摘要

可用的3D人姿势估计方法利用了不同形式的强（2D/3D姿势）或弱（多视图或深度）配对的监督。除非合成或工作局域，否则为每个新目标环境获得此类监督是非常不便的。为此，我们将3D学习作为一个自我监督的适应问题，旨在将任务知识从标记的源域转移到完全不成对的目标。我们建议通过两个显式映射来推断图像对置态。图像到贴边和潜在的档案是后者是从先前的生成对抗自动编码器中获得的预学解码器。接下来，我们引入关系蒸馏，以此来对齐未配对的跨模式样本，即未配对的目标视频和未配对的3D姿势序列。为此，我们提出了一组新的非本地关系，以表征与一般对比关系不同的远程潜在姿势相互作用，在该关系中，正耦合仅限于局部邻里结构。此外，我们提供了一种量化非本地性的客观方法，以选择最有效的关系集。我们评估不同的自适应设置，并在标准基准上展示了最先进的3D人姿势估计表现。

Available 3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision. Barring synthetic or in-studio domains, acquiring such supervision for each new target environment is highly inconvenient. To this end, we cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target. We propose to infer image-to-pose via two explicit mappings viz. image-to-latent and latent-to-pose where the latter is a pre-learned decoder obtained from a prior-enforcing generative adversarial auto-encoder. Next, we introduce relation distillation as a means to align the unpaired cross-modal samples i.e. the unpaired target videos and unpaired 3D pose sequences. To this end, we propose a new set of non-local relations in order to characterize long-range latent pose interactions unlike general contrastive relations where positive couplings are limited to a local neighborhood structure. Further, we provide an objective way to quantify non-localness in order to select the most effective relation set. We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题