半监督2D人类姿势估计中崩溃问题的经验研究

论文标题

半监督2D人类姿势估计中崩溃问题的经验研究

An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

论文作者

Xie, Rongchang, Wang, Chunyu, Zeng, Wenjun, Wang, Yizhou

论文摘要

半监督的学习旨在通过探索未标记的图像来提高模型的准确性。最先进的方法是基于一致性的，通过鼓励模型对不同增强下的图像进行一致的预测来了解未标记的图像。但是，当应用于姿势估计时，该方法会退化并预测未标记图像中的每个像素作为背景。这是因为由于高度不平衡的班级分布，矛盾的预测逐渐将其推向背景类别。但这不是监督学习中的问题，因为它具有准确的标签。这激发了我们通过获得可靠的伪标签来稳定培训的。具体来说，我们学习了两个网络，可以相互教导。特别是，对于每个图像，我们通过应用不同的增强量并将它们馈送到两个网络来组成一个简单的对。每个网络中简单图像的更可靠的预测用于教授另一个网络以了解相应的硬图像。该方法成功地避免了退化，并在公共数据集上取得了有希望的结果。源代码和预估计的模型已在https://github.com/xierc/semi_human_pose上发布。

Semi-supervised learning aims to boost the accuracy of a model by exploring unlabeled images. The state-of-the-art methods are consistency-based which learn about unlabeled images by encouraging the model to give consistent predictions for images under different augmentations. However, when applied to pose estimation, the methods degenerate and predict every pixel in unlabeled images as background. This is because contradictory predictions are gradually pushed to the background class due to highly imbalanced class distribution. But this is not an issue in supervised learning because it has accurate labels. This inspires us to stabilize the training by obtaining reliable pseudo labels. Specifically, we learn two networks to mutually teach each other. In particular, for each image, we compose an easy-hard pair by applying different augmentations and feed them to both networks. The more reliable predictions on easy images in each network are used to teach the other network to learn about the corresponding hard images. The approach successfully avoids degeneration and achieves promising results on public datasets. The source code and pretrained models have been released at https://github.com/xierc/Semi_Human_Pose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题