Camanet：集体激活图引导注意力网络的放射学报告生成

论文标题

Camanet：集体激活图引导注意力网络的放射学报告生成

CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation

论文作者

Wang, Jun, Bhalerao, Abhir, Yin, Terry, See, Simon, He, Yulan

论文摘要

放射学报告的产生（RRG）引起了越来越多的研究关注，因为它具有减轻医疗资源短缺的巨大潜力，并有助于放射科医生的疾病决策过程。 RRG的最新进步主要是通过改善模型在编码单模式特征表示方面的能力来驱动的，而很少有研究明确地探索了图像区域和单词之间的跨模式对齐。放射科医生通常首先关注异常的图像区域，然后再构成相应的文本描述，因此，学习一个意识到图像中异常的RRG模型非常重要。在此激励的情况下，我们提出了一个类激活图指导注意力网络（CAMANET），该图明确地通过使用汇总的类激活图来监督交叉模式的注意力学习，并同时丰富歧视性信息，从而明确促进了跨模式的对准。 Camanet包含三个互补模块：一个视觉判别图生成模块，以产生每个视觉令牌的重要性/贡献；视觉判别图帮助编码器学习判别性表示并丰富判别信息；以及视觉文本注意一致性模块，以确保视觉和文本令牌之间的注意力一致性，以实现交叉模式对齐。实验结果表明，在两个常用的RRG基准上，Camanet优于先前的SOTA方法。

Radiology report generation (RRG) has gained increasing research attention because of its huge potential to mitigate medical resource shortages and aid the process of disease decision making by radiologists. Recent advancements in RRG are largely driven by improving a model's capabilities in encoding single-modal feature representations, while few studies explicitly explore the cross-modal alignment between image regions and words. Radiologists typically focus first on abnormal image regions before composing the corresponding text descriptions, thus cross-modal alignment is of great importance to learn a RRG model which is aware of abnormalities in the image. Motivated by this, we propose a Class Activation Map guided Attention Network (CAMANet) which explicitly promotes crossmodal alignment by employing aggregated class activation maps to supervise cross-modal attention learning, and simultaneously enrich the discriminative information. CAMANet contains three complementary modules: a Visual Discriminative Map Generation module to generate the importance/contribution of each visual token; Visual Discriminative Map Assisted Encoder to learn the discriminative representation and enrich the discriminative information; and a Visual Textual Attention Consistency module to ensure the attention consistency between the visual and textual tokens, to achieve the cross-modal alignment. Experimental results demonstrate that CAMANet outperforms previous SOTA methods on two commonly used RRG benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题