通过X光片和解剖提示改善放射学摘要

论文标题

通过X光片和解剖提示改善放射学摘要

Improving Radiology Summarization with Radiograph and Anatomy Prompts

论文作者

Hu, Jinpeng, Chen, Zhihong, Liu, Yang, Wan, Xiang, Chang, Tsung-Hui

论文摘要

这种印象对于转诊医生要掌握关键信息至关重要，因为从放射科医生的发现和推理得出结论，这是至关重要的。为了减轻放射科医生的工作量并减少印象写作中的重复性人工劳动，许多研究人员专注于自动印象产生。但是，有关此任务的最新著作主要总结了相应的发现，并更少注意放射学图像。在临床上，X光片可以提供更详细的有价值的观察结果，以增强放射科医生的印象写作，尤其是对于复杂的病例。此外，调查结果中的每个句子通常都集中在单个解剖学上，因此它们只需要与相应的解剖区域而不是整个图像匹配，这对文本和视觉特征对齐非常有益。因此，我们提出了一种新型解剖学增强的多模式模型，以促进印象产生。详细说明，我们首先构建了一组规则来提取解剖学，并将这些提示放入每个句子中，以突出解剖学特征。然后，应用两个单独的编码器以从X光片和发现中提取特征。之后，我们利用一个对比度学习模块在整个级别上对齐这两个表示，并在借助解剖学增强句子表示的帮助下使用共同注意力将它们融合在句子级别上。最后，解码器将融合信息作为产生印象的输入。两个基准数据集的实验结果证实了提出的方法的有效性，该方法可实现最先进的结果。

The impression is crucial for the referring physicians to grasp key information since it is concluded from the findings and reasoning of radiologists. To alleviate the workload of radiologists and reduce repetitive human labor in impression writing, many researchers have focused on automatic impression generation. However, recent works on this task mainly summarize the corresponding findings and pay less attention to the radiology images. In clinical, radiographs can provide more detailed valuable observations to enhance radiologists' impression writing, especially for complicated cases. Besides, each sentence in findings usually focuses on single anatomy, so they only need to be matched to corresponding anatomical regions instead of the whole image, which is beneficial for textual and visual features alignment. Therefore, we propose a novel anatomy-enhanced multimodal model to promote impression generation. In detail, we first construct a set of rules to extract anatomies and put these prompts into each sentence to highlight anatomy characteristics. Then, two separate encoders are applied to extract features from the radiograph and findings. Afterward, we utilize a contrastive learning module to align these two representations at the overall level and use a co-attention to fuse them at the sentence level with the help of anatomy-enhanced sentence representation. Finally, the decoder takes the fused information as the input to generate impressions. The experimental results on two benchmark datasets confirm the effectiveness of the proposed method, which achieves state-of-the-art results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题