论文标题

面具焦点损失:用规范对象检测网络计数密集人群的统一框架

Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks

论文作者

Zhong, Xiaopin, Wang, Guankun, Liu, Weixiang, Wu, Zongze, Deng, Yuanlong

论文摘要

作为一项基本的计算机视觉任务,人群在公共安全中起着重要作用。目前,基于深度学习的头部检测是人群计数的一种有前途的方法。但是,由于三个原因,高度关注的对象检测网络不能很好地应用于此问题:(1)现有的损失功能无法解决在高度致密和复杂的场景中样本不平衡; (2)规范对象探测器在损耗计算中缺乏空间相干性,而无视对象位置和背景区域之间的关系; (3)大多数头部检测数据集仅用中心点注释,即没有边界框。为了克服这些问题,我们提出了一种基于高斯内核基于热图的新型面膜局灶性损失(MFL)。 MFL基于热图和二进制特征地图地面真相提供了一个统一的框架。此外,我们介绍了带有全面注释的合成数据集GTA_Head,以进行评估和比较。广泛的实验结果表明,我们的MFL在各种检测器和数据集中都具有出色的性能,并且它可以分别将MAE和RMSE降低高达47.03%和61.99%。因此,我们的工作为基于密度估计的人群计数方法奠定了坚实的基础。

As a fundamental computer vision task, crowd counting plays an important role in public safety. Currently, deep learning based head detection is a promising method for crowd counting. However, the highly concerned object detection networks cannot be well applied to this problem for three reasons: (1) Existing loss functions fail to address sample imbalance in highly dense and complex scenes; (2) Canonical object detectors lack spatial coherence in loss calculation, disregarding the relationship between object location and background region; (3) Most of the head detection datasets are only annotated with the center points, i.e. without bounding boxes. To overcome these issues, we propose a novel Mask Focal Loss (MFL) based on heatmap via the Gaussian kernel. MFL provides a unifying framework for the loss functions based on both heatmap and binary feature map ground truths. Additionally, we introduce GTA_Head, a synthetic dataset with comprehensive annotations, for evaluation and comparison. Extensive experimental results demonstrate the superior performance of our MFL across various detectors and datasets, and it can reduce MAE and RMSE by up to 47.03% and 61.99%, respectively. Therefore, our work presents a strong foundation for advancing crowd counting methods based on density estimation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源