论文标题
使用AMANV2和DPC-CAPTIONSV2对图像的美学属性评估
Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2
论文作者
论文摘要
图像美学质量评估在过去十年中很受欢迎。除数值评估外,还提出了自然语言评估(美学字幕)来描述图像的一般美学印象。在本文中,我们提出了美学属性评估,即审美属性字幕,即评估诸如组成,照明使用和颜色布置之类的美学属性。标记美学属性的注释是一项非平凡的任务,该评论限制了相应数据集的规模。我们以半自动方式构建了一个名为DPC-CAPTIONSV2的新型数据集。知识从带有完整注释的小型数据集转移到摄影网站的大规模专业评论。 DPC-CAPTIONSV2的图像包含最多4个美学属性的注释:组成,照明,颜色和主题。然后,我们根据BUTD模型和VLPSA模型提出了一种新版本的美学多属性网络(AMANV2)。 AMANV2融合了带有完整注释的小规模PCCD数据集和带有完整注释的大规模dpccaptionsv2数据集的混合物的功能。 DPCCAPTIONSV2的实验结果表明,我们的方法可以预测对4种美学属性的评论,这些评论比上一个Aman模型所产生的方法更接近美学主题。通过图像字幕的评估标准,专门设计的AMANV2模型对CNN-LSTM模型和AMAN模型更好。
Image aesthetic quality assessment is popular during the last decade. Besides numerical assessment, nature language assessment (aesthetic captioning) has been proposed to describe the generally aesthetic impression of an image. In this paper, we propose aesthetic attribute assessment, which is the aesthetic attributes captioning, i.e., to assess the aesthetic attributes such as composition, lighting usage and color arrangement. It is a non-trivial task to label the comments of aesthetic attributes, which limit the scale of the corresponding datasets. We construct a novel dataset, named DPC-CaptionsV2, by a semi-automatic way. The knowledge is transferred from a small-scale dataset with full annotations to large-scale professional comments from a photography website. Images of DPC-CaptionsV2 contain comments up to 4 aesthetic attributes: composition, lighting, color, and subject. Then, we propose a new version of Aesthetic Multi-Attributes Networks (AMANv2) based on the BUTD model and the VLPSA model. AMANv2 fuses features of a mixture of small-scale PCCD dataset with full annotations and large-scale DPCCaptionsV2 dataset with full annotations. The experimental results of DPCCaptionsV2 show that our method can predict the comments on 4 aesthetic attributes, which are closer to aesthetic topics than those produced by the previous AMAN model. Through the evaluation criteria of image captioning, the specially designed AMANv2 model is better to the CNN-LSTM model and the AMAN model.