论文标题

空间文本 - 周期立方体分析的基础(扩展版)

A Foundation for Spatio-Textual-Temporal Cube Analytics (Extended Version)

论文作者

Iqbal, Mohsin, Lissandrini, Matteo, Pederse, Torben Bach

论文摘要

每天都会产生大量的空间,文本和时间数据。这是包含非结构化组件(文本),空间组件(地理位置)和时间组件(时间戳)的数据。因此,需要一种强大而一般的方式来共同分析空间,文本和时间数据。在本文中,我们定义和形式化了空间 - 周期性立方体结构,以实现与空间,文本和时间数据相对于空间,文本和时间数据的合并有效和有效的分析查询。我们对时尚周期对象的新型数据模型可实现新颖的关节和集成的空间,文本和时间见解,这些洞察力很难使用现有方法获得。此外,我们通过相关的新型时髦 - 文本 - 周期性olap运算符介绍了新的时髦 - 周期性措施的新概念。为了允许有效的大规模分析,我们提出了一个预处理框架,以确切而近似地计算时空测量。我们在现实世界中的Twitter数据集上进行的全面实验评估证实,与NO Bielditization基线相比,我们提出的方法将查询响应时间降低了1-5个数量级,并且与完全实现的基线相比,在Spatio-Textual-textual-tempor-porpor-porpor-porpor-porpor-porpor-porpor-porpor-lime-tebledized基线的基线相比,存储成本在97%至99.9%之间。此外,近似计算的准确性在90%至100%之间,而与没有实现相比,将查询响应时间减少了3-5个数量级。

Large amounts of spatial, textual, and temporal data are being produced daily. This is data containing an unstructured component (text), a spatial component (geographic position), and a time component (timestamp). Therefore, there is a need for a powerful and general way of analyzing spatial, textual, and temporal data together. In this paper, we define and formalize the Spatio-Textual-Temporal Cube structure to enable combined effective and efficient analytical queries over spatial, textual, and temporal data. Our novel data model over spatio-textual-temporal objects enables novel joint and integrated spatial, textual, and temporal insights that are hard to obtain using existing methods. Moreover, we introduce the new concept of spatio-textual-temporal measures with associated novel spatio-textual-temporal-OLAP operators. To allow for efficient large-scale analytics, we present a pre-aggregation framework for the exact and approximate computation of spatio-textual-temporal measures. Our comprehensive experimental evaluation on a real-world Twitter dataset confirms that our proposed methods reduce query response time by 1-5 orders of magnitude compared to the No Materialization baseline and decrease storage cost between 97% and 99.9% compared to the Full Materialization baseline while adding only a negligible overhead in the Spatio-Textual-Temporal Cube construction time. Moreover, approximate computation achieves an accuracy between 90% and 100% while reducing query response time by 3-5 orders of magnitude compared to No Materialization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源