论文标题
变形金刚满足视觉学习理解:全面评论
Transformers Meet Visual Learning Understanding: A Comprehensive Review
论文作者
论文摘要
动态注意机制和全球建模能力使变压器具有强大的特征学习能力。近年来,变压器已与计算机视觉中的CNN方法相媲美。这篇评论主要调查了图像和视频应用中变形金刚的当前研究进度,这对视觉学习理解中的变压器进行了全面概述。首先,审查了注意机制,这在变压器中起着至关重要的作用。然后,引入了视觉变压器模型和每个模块的原理。第三,研究了现有的基于变压器的模型,并在视觉学习理解应用中比较了它们的性能。研究了三个图像任务和两个计算机视觉的视频任务。前者主要包括图像分类,对象检测和图像分割。后者包含对象跟踪和视频分类。这对于比较几个公共基准数据集中各种任务中不同模型的性能很重要。最后,总结了十个一般问题,并在本综述中给出了视觉变压器的发展前景。
Dynamic attention mechanism and global modeling ability make Transformer show strong feature learning ability. In recent years, Transformer has become comparable to CNNs methods in computer vision. This review mainly investigates the current research progress of Transformer in image and video applications, which makes a comprehensive overview of Transformer in visual learning understanding. First, the attention mechanism is reviewed, which plays an essential part in Transformer. And then, the visual Transformer model and the principle of each module are introduced. Thirdly, the existing Transformer-based models are investigated, and their performance is compared in visual learning understanding applications. Three image tasks and two video tasks of computer vision are investigated. The former mainly includes image classification, object detection, and image segmentation. The latter contains object tracking and video classification. It is significant for comparing different models' performance in various tasks on several public benchmark data sets. Finally, ten general problems are summarized, and the developing prospects of the visual Transformer are given in this review.