论文标题
自动检测机器生成的文本:一项关键调查
Automatic Detection of Machine Generated Text: A Critical Survey
论文作者
论文摘要
文本生成模型(TGM)在生成与人类语言风格相匹配的文本方面表现出色。这样的TGM可以被对手滥用,例如,通过自动产生虚假新闻和假产品评论,这些评论看起来像真实而愚蠢的人。可以区分TGM产生的文本和人书面文本的探测器在减轻这种滥用TGM的过程中起着至关重要的作用。最近,来自自然语言处理(NLP)和机器学习(ML)社区的作品有一系列作品,以建立英语的准确探测器。尽管这个问题很重要,但目前尚无工作来调查这种快速成长的文献,并向新移民介绍了重要的研究挑战。在这项工作中,我们通过提供对该文献的批判性调查和审查来促进对此问题的全面理解,以填补这一空白。我们对最先进的探测器进行了深入的错误分析,并讨论了研究方向,以指导这一令人兴奋的领域的未来工作。
Text generative models (TGMs) excel in producing text that matches the style of human language reasonably well. Such TGMs can be misused by adversaries, e.g., by automatically generating fake news and fake product reviews that can look authentic and fool humans. Detectors that can distinguish text generated by TGM from human written text play a vital role in mitigating such misuse of TGMs. Recently, there has been a flurry of works from both natural language processing (NLP) and machine learning (ML) communities to build accurate detectors for English. Despite the importance of this problem, there is currently no work that surveys this fast-growing literature and introduces newcomers to important research challenges. In this work, we fill this void by providing a critical survey and review of this literature to facilitate a comprehensive understanding of this problem. We conduct an in-depth error analysis of the state-of-the-art detector and discuss research directions to guide future work in this exciting area.