论文标题
对在线仇恨言论产生反叙事:数据和策略
Generating Counter Narratives against Online Hate Speech: Data and Strategies
论文作者
论文摘要
最近,研究开始专注于避免在网上处理仇恨时,避免内容适度(例如审查和封锁)所带来的不希望的效果。核心思想是直接干预讨论,旨在应对仇恨内容并防止其进一步传播。因此,自动化策略(例如自然语言的产生)开始研究。尽管如此,他们仍缺乏足够的质量数据,并且倾向于产生通用/重复的响应。在意识到上述限制的情况下,我们提出了一项有关如何有效收集仇恨反应的研究,采用大规模的无监督语言模型,例如GPT-2来生成银数据,以及最佳的注释策略/神经体系结构,这些策略/神经体系结构可用于在专家验证/后编辑之前用于数据过滤。
Recently research has started focusing on avoiding undesired effects that come with content moderation, such as censorship and overblocking, when dealing with hatred online. The core idea is to directly intervene in the discussion with textual responses that are meant to counter the hate content and prevent it from further spreading. Accordingly, automation strategies, such as natural language generation, are beginning to be investigated. Still, they suffer from the lack of sufficient amount of quality data and tend to produce generic/repetitive responses. Being aware of the aforementioned limitations, we present a study on how to collect responses to hate effectively, employing large scale unsupervised language models such as GPT-2 for the generation of silver data, and the best annotation strategies/neural architectures that can be used for data filtering before expert validation/post-editing.