论文标题
Armanemo:用于基于文本情绪检测的波斯数据集
ArmanEmo: A Persian Dataset for Text-based Emotion Detection
论文作者
论文摘要
随着社交媒体平台上的开放文本数据的最新扩散,在过去的几年中,文本的情感检测(ED)受到了更多关注。它有许多应用程序,特别是对于企业和在线服务提供商,情感检测技术可以通过分析客户/用户对产品和服务的感受来帮助他们做出明智的商业决策。在这项研究中,我们介绍了Armanemo,这是一个标记为7000多种七个类别的波斯句子的人类标记的情感数据集。该数据集是从不同资源中收集的,包括Twitter,Instagram和Digikala(伊朗电子商务公司)的评论。标签是基于埃克曼(Ekman)的六种基本情感(愤怒,恐惧,幸福,仇恨,悲伤,奇迹),另一个类别(其他)考虑了埃克曼(Ekman)模型中未包含的任何其他情绪。与数据集一起,我们还提供了几种基线模型,用于以最先进的基于变压器的语言模型为重点。我们的最佳模型在我们的测试数据集中达到了75.39%的宏观平均得分。此外,我们还进行了转移学习实验,以将我们提出的数据集的概括与其他波斯情绪数据集进行比较。这些实验的结果表明,我们的数据集在现有的波斯情绪数据集中具有较高的普遍性。 Armanemo可在https://github.com/arman-rayan-sharif/arman-text-emotion上公开使用。
With the recent proliferation of open textual data on social media platforms, Emotion Detection (ED) from Text has received more attention over the past years. It has many applications, especially for businesses and online service providers, where emotion detection techniques can help them make informed commercial decisions by analyzing customers/users' feelings towards their products and services. In this study, we introduce ArmanEmo, a human-labeled emotion dataset of more than 7000 Persian sentences labeled for seven categories. The dataset has been collected from different resources, including Twitter, Instagram, and Digikala (an Iranian e-commerce company) comments. Labels are based on Ekman's six basic emotions (Anger, Fear, Happiness, Hatred, Sadness, Wonder) and another category (Other) to consider any other emotion not included in Ekman's model. Along with the dataset, we have provided several baseline models for emotion classification focusing on the state-of-the-art transformer-based language models. Our best model achieves a macro-averaged F1 score of 75.39 percent across our test dataset. Moreover, we also conduct transfer learning experiments to compare our proposed dataset's generalization against other Persian emotion datasets. Results of these experiments suggest that our dataset has superior generalizability among the existing Persian emotion datasets. ArmanEmo is publicly available for non-commercial use at https://github.com/Arman-Rayan-Sharif/arman-text-emotion.