论文标题
#选举2020:2020年美国总统选举的第一个公共Twitter数据集
#Election2020: The First Public Twitter Dataset on the 2020 US Presidential Election
论文作者
论文摘要
民主政治话语的完整性是保证自由和公正选举的核心。由于社交媒体经常决定与政治有关的讨论的音调和趋势,因此能够研究在线chat不休,尤其是在重要的投票事件的过程中,例如即将举行的2020年11月3日美国总统大选,这一点至关重要。有限的访问社交媒体数据通常是阻碍,阻碍或放缓进步的第一个障碍,最终是我们对在线政治话语的理解。为了减轻此问题并试图增强计算社会科学研究界的能力,我们决定公开发布大规模的,纵向的美国政治和选举相关推文。我们已经收集了一年多的多语言数据集涵盖了数亿条推文,并跟踪了2019年至2020年之间所有显着的美国政治趋势,参与者和事件。它早于共和党和民主党初次的整个时期,并涵盖了所有伊斯尔两国总统竞争者的实时跟踪。之后,它重点关注总统和副总统候选人。我们的数据集发布经过策划,记录,并将在每周的基础上不断更新,直到2020年11月3日及以后。我们希望学术界,计算记者和研究从业人员都将利用我们的数据集来研究相关的科学和社会问题,包括在美国最近的选举事件和全球范围内的最近选举活动中普遍存在的在线政治话语的错误信息,信息操纵,干预和在线政治话语的扭曲。 我们的数据集可在以下网址找到:https://github.com/echen102/us-pres-elections-2020
The integrity of democratic political discourse is at the core to guarantee free and fair elections. With social media often dictating the tones and trends of politics-related discussion, it is of paramount important to be able to study online chatter, especially in the run up to important voting events, like in the case of the upcoming November 3, 2020 U.S. Presidential Election. Limited access to social media data is often the first barrier to impede, hinder, or slow down progress, and ultimately our understanding of online political discourse. To mitigate this issue and try to empower the Computational Social Science research community, we decided to publicly release a massive-scale, longitudinal dataset of U.S. politics- and election-related tweets. This multilingual dataset that we have been collecting for over one year encompasses hundreds of millions of tweets and tracks all salient U.S. politics trends, actors, and events between 2019 and 2020. It predates and spans the whole period of Republican and Democratic primaries, with real-time tracking of all presidential contenders of both sides of the isle. After that, it focuses on presidential and vice-presidential candidates. Our dataset release is curated, documented and will be constantly updated on a weekly-basis, until the November 3, 2020 election and beyond. We hope that the academic community, computational journalists, and research practitioners alike will all take advantage of our dataset to study relevant scientific and social issues, including problems like misinformation, information manipulation, interference, and distortion of online political discourse that have been prevalent in the context of recent election events in the United States and worldwide. Our dataset is available at: https://github.com/echen102/us-pres-elections-2020