论文标题
基于立体声的深度估计的深度学习技术的调查
A Survey on Deep Learning Techniques for Stereo-based Depth Estimation
论文作者
论文摘要
从RGB图像中估算深度是一个长期存在的问题,数十年来,计算机视觉,图形和机器学习社区已经探索了它。在现有技术中,由于其与人类双眼系统的牢固联系,立体声匹配仍然是文献中最广泛使用的。传统上,通过跨多个图像的手工制作的功能来解决基于立体声的深度估计。尽管进行了广泛的研究,但这些传统技术仍然在质地高度高,统一区域和遮挡的情况下受到影响。由于他们在解决各种2D和3D视觉问题方面的成功促进的动机,基于立体声的深度估算的深度学习吸引了社区的兴趣,在2014年至2019年之间在该领域发表了150多篇论文。这种新一代的方法已经表现出了巨大的绩效飞跃,从而实现了自动驾驶和增强的应用程序。在本文中,我们对这个新的且不断增长的研究领域进行了全面的调查,总结了最常用的管道,并讨论了它们的收益和局限性。回顾迄今为止取得的成就,我们还猜测了基于深度学习的立体声对于深度估计研究的未来可能会有什么。
Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. Among the existing techniques, stereo matching remains one of the most widely used in the literature due to its strong connection to the human binocular system. Traditionally, stereo-based depth estimation has been addressed through matching hand-crafted features across multiple images. Despite the extensive amount of research, these traditional techniques still suffer in the presence of highly textured areas, large uniform regions, and occlusions. Motivated by their growing success in solving various 2D and 3D vision problems, deep learning for stereo-based depth estimation has attracted growing interest from the community, with more than 150 papers published in this area between 2014 and 2019. This new generation of methods has demonstrated a significant leap in performance, enabling applications such as autonomous driving and augmented reality. In this article, we provide a comprehensive survey of this new and continuously growing field of research, summarize the most commonly used pipelines, and discuss their benefits and limitations. In retrospect of what has been achieved so far, we also conjecture what the future may hold for deep learning-based stereo for depth estimation research.