论文标题
喘不过气来:对显着性预测的关注
GASP: Gated Attention For Saliency Prediction
论文作者
论文摘要
显着性预测是指建模公开注意的计算任务。社会提示极大地影响了我们的注意力,从而改变了我们的眼睛运动和行为。为了强调此类特征的功效,我们提出了一种神经模型,用于整合社会提示和加权其影响。我们的模型包括两个阶段。在第一阶段,我们通过关注凝视,估算凝视方向和认识情感来检测两个社会线索。然后,通过图像处理操作将这些特征转化为时空图。转化的表示形式传播到第二阶段(GASP),在那里我们探索了晚期融合的各种技术,以整合社会提示,并引入两个子网络,以将注意力引向相关的刺激。我们的实验表明,融合方法为静态整合方法获得了更好的结果,而无融合方法的影响是每种模态的影响未知的,当与复发模型进行动态显着性预测相结合时,会产生更好的结果。我们表明,与没有社交线索的动态显着性模型相比,凝视方向和情感表示对基础真相的对应关系的预测至少为5%。此外,情感表示可以改善喘气,支持在预测显着性方面考虑偏见的注意力的必要性。
Saliency prediction refers to the computational task of modeling overt attention. Social cues greatly influence our attention, consequently altering our eye movements and behavior. To emphasize the efficacy of such features, we present a neural model for integrating social cues and weighting their influences. Our model consists of two stages. During the first stage, we detect two social cues by following gaze, estimating gaze direction, and recognizing affect. These features are then transformed into spatiotemporal maps through image processing operations. The transformed representations are propagated to the second stage (GASP) where we explore various techniques of late fusion for integrating social cues and introduce two sub-networks for directing attention to relevant stimuli. Our experiments indicate that fusion approaches achieve better results for static integration methods, whereas non-fusion approaches for which the influence of each modality is unknown, result in better outcomes when coupled with recurrent models for dynamic saliency prediction. We show that gaze direction and affective representations contribute a prediction to ground-truth correspondence improvement of at least 5% compared to dynamic saliency models without social cues. Furthermore, affective representations improve GASP, supporting the necessity of considering affect-biased attention in predicting saliency.