论文标题
存储,预处理和分析推文:查找合适的NOSQL系统
Storing, preprocessing and analyzing Tweets: Finding the suitable NoSQL system
论文作者
论文摘要
NOSQL系统是旨在处理大量数据的新一代数据库。但是,有大量的NOSQL系统,每个系统都有其自身的特征。因此,选择合适的NOSQL系统来处理推文很具有挑战性。基于这些动机,这项工作是为了找到合适的NOSQL系统来存储,预处理和分析推文。 本文介绍了管理推文的要求,并提供了五个最受欢迎的NOSQL系统的详细比较,即Redis,Cassandra,Mongodb,Couchbase和Neo4J就这些要求提供了比较。这项工作的结果表明,对于Tweets存储,预处理和分析,MongoDB和Couchbase是最合适的NOSQL系统。与相关的工作不同,这项工作比较了实际情况下不同类型的五个NOSQL系统,这些系统是在推文存储,预处理和分析的情况下进行比较。所选的方案使得不仅可以评估读写操作的性能,还可以评估与推文管理有关的其他要求,例如可扩展性,分析工具支持和分析语言支持。
NoSQL systems are a new generation of databases that aim to handle a large volume of data. However there is a large set of NoSQL systems, each has its own characteristics. Consequently choosing the suitable NoSQL system to handle Tweets is challenging. Based on these motivations, this work is carried out to find the suitable NoSQL system to store, preprocess and analyze Tweets. This paper presents the requirements of managing Tweets and provides a detailed comparison of five of the most popular NoSQL systems namely, Redis, Cassandra, MongoDB, Couchbase and Neo4j regarding to these requirements. The results of this work show that for Tweets storing, preprocessing and analyzing, MongoDB and Couchbase are the most suitable NoSQL systems. Unlike related works, this work compares five NoSQL systems from different types in a real scenario which is Tweet storing, preprocessing and analyzing. The chosen scenario enables to evaluate not only the performance of read and write operations, but also other requirements related to Tweets management such as scalability, analysis tools support and analysis languages support.