论文标题

社交媒体采矿工具包(SMMT)

Social Media Mining Toolkit (SMMT)

论文作者

Tekumalla, Ramya, Banda, Juan M.

论文摘要

将社交媒体数据用于生物医学界的研究目的的普及急剧增加。仅在PubMed,自2014年以来,就有近2500个出版条目,涉及分析Twitter和Reddit的社交媒体数据。但是,这些作品中的绝大多数没有共享其代码或数据来复制他们的研究。除了最少的例外,少数人会给研究人员承担负担,以找出如何获取数据,如何最好地格式化其数据以及如何在获取的数据上创建自动和手动注释。为了解决这个紧迫的问题,我们介绍了社交媒体挖掘工具包(SMMT),该工具套件旨在封装获取,预处理,注释,注释和标准化社交媒体数据的繁琐细节。我们工具包的目的是使研究人员专注于回答研究问题,而不是使用社交媒体数据的技术方面。通过使用标准工具包,研究人员将能够以一致的方式获取,使用和释放数据,这对于使用该工具包的每个人来说都是透明的,因此可以简化社交媒体领域中的研究可重复性和可访问性。

There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code or data for replicating their studies. With minimal exceptions, the few that do, place the burden on the researcher to figure out how to fetch the data, how to best format their data, and how to create automatic and manual annotations on the acquired data. In order to address this pressing issue, we introduce the Social Media Mining Toolkit (SMMT), a suite of tools aimed to encapsulate the cumbersome details of acquiring, preprocessing, annotating and standardizing social media data. The purpose of our toolkit is for researchers to focus on answering research questions, and not the technical aspects of using social media data. By using a standard toolkit, researchers will be able to acquire, use, and release data in a consistent way that is transparent for everybody using the toolkit, hence, simplifying research reproducibility and accessibility in the social media domain.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源