VL4POSE：通过分布外检测进行姿势估计的积极学习

论文标题

VL4POSE：通过分布外检测进行姿势估计的积极学习

VL4Pose: Active Learning Through Out-Of-Distribution Detection For Pose Estimation

论文作者

Shukla, Megh, Roy, Roshan, Singh, Pankaj, Ahmed, Shuaib, Alahi, Alexandre

论文摘要

计算的进步已使广泛访问姿势估计，从而创建了新的数据流来源。与用于数据收集的模拟设置不同，通过设备主动学习来利用这些数据流，使我们能够直接从现实世界中进行采样以改善培训分布的传播。但是，在设备计算能力有限，这意味着任何候选活动的主动学习算法都应具有较低的计算足迹，同时也可靠。尽管多种算法适合构成估计，但它们要么使用广泛的计算来为最先进的结果供电，要么在低资源环境中不具竞争力。我们使用VL4POSE（姿势估计的视觉可能性）来解决此限制，这是通过分布外检测进行主动学习的第一条原理方法。我们从一个简单的前提开始：姿势估计器通常可以预测分布样本的不一致姿势。因此，我们可以确定对模型的姿势分布，以确定模型不确定的不一致的姿势吗？我们的解决方案涉及通过通过最大似然估计训练的简单参数贝叶斯网络对姿势进行建模。因此，在我们的框架内产生的可能性很小的姿势是分布的样本，使其适合注释。我们还观察到了两个有用的侧面：通过统一关节和姿势水平的歧义以及VL4POSE在有限的场景中进行姿势改进的无意识但受欢迎的能力，可以通过统一关节和姿势水平的歧义来获得更好的不确定性估计。我们在三个数据集上执行定性和定量实验：MPII，LSP和ICVL，跨越人类和手部姿势估计。最后，我们注意到VL4POSE是简单，计算便宜且竞争性的，因此它适用于诸如设备上的主动学习之类的挑战性任务。

Advances in computing have enabled widespread access to pose estimation, creating new sources of data streams. Unlike mock set-ups for data collection, tapping into these data streams through on-device active learning allows us to directly sample from the real world to improve the spread of the training distribution. However, on-device computing power is limited, implying that any candidate active learning algorithm should have a low compute footprint while also being reliable. Although multiple algorithms cater to pose estimation, they either use extensive compute to power state-of-the-art results or are not competitive in low-resource settings. We address this limitation with VL4Pose (Visual Likelihood For Pose Estimation), a first principles approach for active learning through out-of-distribution detection. We begin with a simple premise: pose estimators often predict incoherent poses for out-of-distribution samples. Hence, can we identify a distribution of poses the model has been trained on, to identify incoherent poses the model is unsure of? Our solution involves modelling the pose through a simple parametric Bayesian network trained via maximum likelihood estimation. Therefore, poses incurring a low likelihood within our framework are out-of-distribution samples making them suitable candidates for annotation. We also observe two useful side-outcomes: VL4Pose in-principle yields better uncertainty estimates by unifying joint and pose level ambiguity, as well as the unintentional but welcome ability of VL4Pose to perform pose refinement in limited scenarios. We perform qualitative and quantitative experiments on three datasets: MPII, LSP and ICVL, spanning human and hand pose estimation. Finally, we note that VL4Pose is simple, computationally inexpensive and competitive, making it suitable for challenging tasks such as on-device active learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题